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(54) riUe: METHOD AND DEVICE FOR PROCESSING A MULTI-CHANNEL SIGNAL FOR USE WITH A HEADPHONE 




(57) Abstract 

A method and device processes multi-channel audio signals, each channel con-csponding to a loudspeaker placed in a particular 
location in a room, in such a way as to create, over headphones, the sensation of multiple "phantom" loudspeakers placed throughout 
the room. Head Related Transfer FuncUons (HRTFs) are chosen according to the elevation and azimuth of each intended loudspeaker 
relative to the listener, each channel being filtered with an HRTF such that when combined into left and right channels and played over 
headphones, the listener senses that the sound is actually produced by phantom loudspeakers placed throughout the "Virtual" room. A 
database collection of sets of HRTF coefficients from numerous individuals and subsequent matching of the best HRTF set to the individual 
listener provides the listener with listening sensations similar to that which the listener, as an individual, would experience when listening 
to multiple loudspeakers placed throughout the room. An appropriate transfer function applied to the right and left channel output allows 
the sensation of open-ear listening to be experienced through closed-ear headphones. 
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METHOD AND DEVICE FOR PROCESSING A MULTICHANNEL 
<;Tr,NAL FOR USE WITH A HEADPHONE 

Background of the Invention 
Field of the Invendon . Tbe present invention relates to a method and device for processing 
a multi-channel audio signal for reproduction over headphones, fa particular, the present invention 
relates to an apparatus and method for creating, over headphones, the sensation of multiple 
"phantom" loudspeakers in a user matched virtual hstening environment. 

Rack ^ound Information , to an attempt to provide a more realisUc or engulfing listening 
experience in the movie theater, several companies have developed mulu-channel audio formats. 
Each audio channel of the multi-channel signal is routed to one of several loudspeakers distributed 
throughout the theater, providing movie-goers with the sensation that sounds are originating all 
around them. At least one of these formats, for example the Dolby Pro Logic® format, has been 
adapted for use in the home entertainment industry. The Dolby Pro Logic® format is now in wide 
use in home theater systems. As with the theater version, each audio channel of the multi-channel 
signal is routed to one of several loudspeakers placed around the room, providing home listeners with 
the sensation that sounds are originating all around them. As the home entertainment system market 
expands, other multi-channel systems wiU Ukely become available to home consumers. 

When humans Usten to sounds produced by loudspeakers, it is termed open-ear Ustening. 
Open-ear hstening occurs when the ears are uncovered. It is the way we Usten in everyday Ufe to 
an open-ear environment, the sonic inforaiation arrivmg at the ears provides cues about the location 
and distance of the sound source. Humans are able to locaUze a sound to the right or lefk based on 
differences in the arrival times and differences in the sound levels at the two ears. Other subUe 
differences in the spectrum of the sound at each car drum provide cues about the sound source 
elevation and frontOjack location. These differences are related to the filtering effects of several 
body parts, most notably the head and the pinnae of the ears. 

The process of listening while the outer ear surface of the ear is covered (e.g., with 
headphones) is termed closed-ear listening. Covering the ear changes the car canal resonance 
charactCTistics. Due to the physical effects of wearing headphones, sound debvered through 
hcad^jhooes lacks the subtle differaices to time, level, and spectra caused by location, distance, and 
the fihcrtog eflfects of the head and pinna experienced to open-ear Ustentog. Thus, when headphones 
are used with multi-channel home entertainment systems, the advantages of Ustentog via numerous 
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loudsp«k« pU;ed tooghou. nx™ are te., U« so^d often .ppearmg .o be originaung i.s,d. 
the listener's head. 

Tl«e is a need for a syst^n that can process multi-channel audio m such a way as to cause 
the listener to sense multiple "phantom" loudspeakers when Ustening over headphones Such a 
system should process each channel such that the effects of loudspeaker location and distance 
-toukxi to be created by eachchannel signal as well as the mtermgeff^^ 
pmnae are preserved or simulated accurately for that individual listener. 

Accordingly, an object of the present invention is to provide a method for processing the 
multi-chamiel output t>picaUy produced by home emertaimnent or like systems such that when 
presented over headphones, the Ustener is able to select a best match set of head related transfer 
fiuKtxons fromadatabase of measured head related transfer funcuon^ 

the hstener expenences the sensation of multiple "phantom" loudspeakers placed throughout the 
room. 

Another object of the present invention is to provide an apparatus for processing the multi- 
channel output topically produced by home entertainment or hke systems such that when pi^ented 
over headphones, the Ustener experiences listening sensations most like that which the listener as 
an mdividual would experience when listening to multiple loudspeakers placed throughout the ro^ 

Another object of the present invention is to provide an apparatus for processing the multi- 
chamiel output typically produced by home enteitaimnent or Uke systems such that when presented 
over headphones, the Ustener experiences sensations typical of open-ear (unobstructed) Ustening. 

Another object of the present invention is to provide an apparatus and mediod for measuring 
Ae acoustic altering action produced by the head and pimiae of the human ears so as to produce a 
useful database of head related transfer functions. 

Another object of the present invention is to create a database of HRTFs representative of 
the general listening pubUc by measuring and recording a large enough set of such HRTFs such diat 

any given individual is Ukely to be able to sdectaset of HRTFs from the database so that wh^ 

to process an audio signal the user perceives the corresponding sounds to be locaUzed in the proper 

spatial positions. 

Another object of the present invention is to provide a means of determining the "best- 
match" of an individual hstener to one of U.e HRTF sets of the representative database such daat the 
mdividual Ustener can be matched as closely as possible to an already measured set of HRTFs stored 
in a database, such that once properly matched, the individual will experience the correct "phantom- 
locations of die sources of the listening system. 
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Another object of the present invention is to provide a wired or wireless transmission system 
for dimensionalized listening of sound over headphones. 

Other objects of the invention wiU become clear from a review of the complete disclosure. 

Summary of the hivention 
According to the present invention, multiple channels of an audio signal are processed 
through the appUcation of filtering using a head related transfer function (HRTF) or a plurality of 
HRTFs, selected by a user, such that when reduced to two channels, left and right, each channel 
contains information that enables the listener to sense the location of multiple phantom loudspeakers 
when Ustening over headphones. 

Also according to the present invention, multiple channels of an audio signal are processed 
through the apphcation of filtering using HRTFs chosen from a large database such that when 
Ustening through headphones, the listener experiences a sensation that most closely matches the 
sensation the Ustener, as an individual, would experience when listening to multiple loudspeakers. 

In another exemplary embodiment of the present invention, the right and left channels are 
filtered in order to simulate the effects of open-ear listening. 

In another exemplaiy embodiment of the present invention, a complete set of HRTFs for an 
individual is measured and recorded, such that the measured HRTFs are an accurate reflection of the 
filtering effects of that mdividual's head and pinnae, and in which the measurement takes on the order 
of a few minutes. For each individual, several hundred HRTFs are measured such that an HRTF is 
specified for each location in space about the Ustener with an accuracy of approximately 10° in both 
the vertical and horizontal dimensions. 

Ui a fiirther embodiment of this invention, the HRTFs of a sufficient number of individuals 
are measured and stored to create a database such that a given individual is able to select a set of 
HRTFs from the database such that when audio signals are processed with the selected set of 
HRTFs, the user perceives the corresponding sounds to be locahzed in the proper spatial positions. 
In a fijTther embodiment, the database of HRTFs comprises a representative set of HRTF 

sets. 

In another exemplary embodiment of the present invention, an individual is matched to a 
"best-match" set of HRTFs selected from a database of sets of HRTFs measured from a 
representative sample of the general Ustening population, where the individual Ustener participates 
in the matching of the set of HRTFs by comparing the perception created by different HRTF sets and 
selecting the HRTF set providing the best spatial perception. 
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In another exemplary embodiment of the present invention, a database of HRTF sets, 
measured from a representative sample of the listening population, .s estabUshed, such that an 
individual can select a "best-match" set of HRTFs from the database. 

hi a further embodiment a best match set of HRTFs is selected from the database of HRTFs 
and is used to process signals for wired or wireless transmission to a Ustener wearing headphones. 
Brief Descripf ion of the Draunnp c 

Figure 1 is a representation of sound waves received at both ears of a Ustener sittmg in a 
room with a typical multi-channel loudspeaker configuration. 

Figure 2 is a representaUon of the listening sensation experienced through headphones 
according to an exemplary embodiment of the present invention. 

Figure 3a shows the sound source locations used to measure a set of head related transfer 
fimctions (HRTFs) obtained at multiple elevations and azimuths surrounding a Ustener. 

Figure 3b is a graph representing the HRTF for 0 degrees elevation and 30 degrees azimuth 
for three different individuals. 

Figure 4 is a schematic in block diagram fomi of a typical multi-channel headphone 
processing system according to an exemplary embodiment of the present invention. 

Figure 5 is a schematic in block diagram form of a bass boost circuit according to an 
exemplary embodiment of the present invention. 

Figure 6A is a schematic in block diagram form of HRTF fUtering as appUed to a single 
channel according to an exemplary embodiment of the present invention. 

Figure 6B is a schematic in block diagram form of the process of HRTF matching based 
on an ordered set of HRTFs according to the present invention. 

Figure 7 is a representation of a typical digital signal ti-ansmission system comprising a 
transmitting station, a connecting medium called a channel and a receiving station. 
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Figure 8A is a block diagram of a novel radio-frequenc>' transmission system for use in a 
wireless embodiment of this invention. 

Figure 8B is a representation of an adaptive filter for removing the DC component of a 
5 digital signal. 

Figure 9A shows a computer simulated input gaussian noise source with a variance of 2. 5 
mV and a mean of 0.5 V. 

Figure 9B shows the tracking constant, C[k], during a computer simulation of the removal 
of the DC component of an input gaussian noise source by an adaptive filter. 

Figure 9C shows the output of an adaptive filter where the input is a gaussian noise source. 

j5 Figures 9D and 9E show the magnitude frequency response of the input gaussian noise 

waveform and DC shifted output. 

Figure 9F is a schematic of a state machine. 

20 Figure 9G is a timing diagram of various clock outputs for decoding signals encoded 

according to one embodiment of this invention. 

Figure 1 0 depicts an HRTF matching process according to the present invention. 

;25 Figure 1 1 shows an impulse response wave form recorded from one individual at one spatial 

location for one ear. 

Figure 12 illustrates critical band filtering according to the present invention. 
30 Figure 13 illustrates an exemplary subject filtered HRTF matrix according to the present 



invention. 



Figure 14 illustrates a hypothetical hierarchical agglomerative clustering procedure in two 
dimensions according to the present invention. 
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Figure 15 illustrates a hypothetical hierarchical agglomerative clustenng procedure 
according to an exemplary embodiment of the present mvention. 

Figure 16 is a schematic in block diagram fonn of a typical reverberation processor 
constructed of parallel lowpass comb filters. 

Figure 1 7 is a schematic in block diagram form of a typical lowpass comb filter. 

Figure 18a is a schematic of a preferred embodiment of an HRTF measuremem means. 

Figure 18b further Uiustrates a preferred embodiment of an HRTF measurement means. 

Figure 19 is a schematic representation of the HRTF measurement comrol system. 

figure 20 is a schematic representation of the HRTF measurement control system sofhvare 
flow chart. 

Figure 21 A is a schanatic representation of a from view of a sound room m which HRTFs 
may be measured to produce the database of HRTFs of this invention. 

Figure 21B is a schematic representation of a top view of the sound room. 

Figure 21C shows the detail of the cross section of the wall of the sound room. 

Figure 22A shows the probability that the RMS distance, between any individual's HRTF 
and the nearest HRTF aheady in the database, is less dian a certain RMS distance (dB), as a function 
of the number of HRTF sets in the database. 

Figure 22B shows the cumulative density function of the distance between each of 150 
HRTFs and the mean HRTF. 

Figure 22C shows the change m average mean as a function of subsample group size. 
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Figure 22D shows the change m average standard deviation as a functioD of subsample 
group size. 

Figure 22E shows the mean minimum distance between any HRTF set of the 150 HRTF 
sets and one of the stored HRTF sets as a function of the number of stored HRTF sets. 

Figures 23A, B, C are block diagrams of a circuit according to this invenUon for processing 
signals usmg a best match set of HRTFs selected by a user from the database of this invention. 

Figure 24 is a detaU of an early reflection processing circuit 612 according to Figure 23. 

Figure 25 is a detail of an HRTF processing circuit 663 according to Figure 23 comprising 
finite anpulse response filters that implement HRTFs selected from the database of this invention. 

Figure 26 is a detaU of a reverberation circuit 671 according to Figure 23. 

Figure 27 is a detail of a bass boost processing circuit 670 according to Figure 23. 

Figures 28A, B, C are a schematic representation of the HRTF selection and matching 
performed by a user to arrive at a best match set of HRTFs which is then used for processing of 
audio signals according to Figures 25 and 23 . 

Figure 29A, B is an alternate embodiment to that disclosed in Figures 28A, B, and C. 



rvtniled Descriptio n of the Invention 
The method and device according to the present invention processes audio signals, including 
multi-channel audio signals having a plurahty of channels, each corresponding to a loudspeaker 
placed in a particular locaUon in a room, in such a way as to create, over headphone, the sensation 
of multiple "phantom" loudspeakers placed throughout the room. The present invention utihzes 
Head Related Transfer Functions (HRTFs) that are chosen according to the elevation and azimuth 
of each intended loudspeaker relative to the listener, each channel being filtered by a set of HRTFs 
such that when combined into left and right channels and played over headphones, the Ustener 
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senses that the sound is actually produced by phantom loudspeakers placed throughout the "vutual" 
room. 

The aiiemg of d,e pr=s«l invenuon utilize a daubase coUecuon of sets of HRTFs 
measured from oun^ individuals and s.U,sequen. nrndiiug of U.e bes. HRTF se. ,o an individual 

l^te.er.Uu.sprovidingU«UsttnerwiftUs,eni=8sensauonssimilaru>d,al«h,chtheUstt„er asan 
UKlvKiual, would expenenee when UsKn.,g ,o muiuple loudspeakers plaeed d>™ughou, .he'ro^n 
Addu-onally. .he present invention udlizes an appropriate transfer fintction apphed to the right and 

left chattel outpm so that the s«sanonofop««ar listening may be expeneneedthr^rghd 
headphones. 

0 In generating the database coUection of sets of HRTFs, the present invention also provides 

ameasurement device andmethodformeasurmg and recording complete sets of HRTFs of subjects 
from a representative sample of the listemng population, such that the measured HRTFs are an 
accurate reflection of the filtenng effects of the head and pmnae of each of the subjects measured 
For each mdmdual. as many as 360 HRTFs for each ear may be measured. w,th each HRTF 

5 dependmg on the position or location of the sound source with respect to the Ustener These 

measured HRTF sets are storedmadatabase. such thmthedatabase provides HRTF sets from which 
any uKiividual can selectasetofHRTFssuch that when aud.o signals are processed wit^ 
set of HRTFs, the user perceives the corresponding sounds to be localized in the proper spatial 
positions, to thereby achieve optimized 3D virtual audio effects when using headphones. 
> Figure 1 depicts the path of sound waves received at both ears of a Ustener according to a 

typical embodiment ofahome entertainment system 11^ multi-channel audio s.gnal is decoded mto 
multiple chamiels. i.e.. a two-channel encoded signal is decoded into a multi-channel s.gnal m 
accordance with, for example, the Dolby Pro Logic® format. Each cham^el of the multi-channel 
signal IS then played, for example, through its associated loudspeaker, e.g., one of five loudspeakers 
left; right; center; left surround; and right surround. The effect is the sensation that sound is 
originating all around the listener. 

Figure 2 depicts the hstenmg experience created by an exemplar, embodiment of the present 
invenuon. As described in detail with respect to Figure 4, the present mvention processes each 
channel of a multi-channel signal using a set of HRTFs appropriate for the distance and location of 
each phantom loudspeaker (e.g.. the intended loudspeaker for each channel) relative to the listener's 
left and nght ears. AU resulting left ear channels are summed, and aU resulting right ear channels 
are summed producmg two channeU, left and right. Each cham^el is then preferably fUtered usmg 
a transfer ftmction that introduces the effects of open-ear Ustening. When the two 
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IS presented via headphones, the Ustener senses that the sound is originating from Gve phantom 
loudspeakers placed throughout the room, as indicated in Figure 2. 

The manner in which the ears and head filter sound may be described by a Head Related 
Transfer Function (HRTF). An HRTF is a transfer fiinction obuined from one individual for one 
ear for a specific sound source location. An HRTF is described by multiple coefGcients that 
characterize how sound produced at a particular spatial position should be filtered to simulate the 
filtering eSects of the head and outer ear of a particular individual. HRTFs are typicaUy measured 
at various elevations and azimuths. Typical HRTF measurement locations are illustrated in Figure 
3A. 

In Figure 3 A, the horizontal plane located at the center of the Ustener's head 100 represents 
0.0° elevatioa The vertical plane extending forward from the center of the head 100 represents 0.0° 
azimuth. HRTF locations are defined by a pair of elevation and azimuth coordinates and are 
represented by a small sphere 1 10. hi one embodiment of this invention, HRTFs are measured in 
10 degree intervals for the azimuth and 10 degree intervals for the elevation from 30 degrees below 
the horizon to 60 degrees above the horizon. Associated with each sphere 110 is a set of HRTF 
coefficients that represent the transfer ftmction for that sound source location. Each sphere 1 10 is 
actually associated with two HRTFs, one for each car. 

Because no two humans have identical heads and pinnae, no two humans have HRTFs which 
are exactly alike. This fact is demonstrated in Figure 3B which shows a graph representing the 
HRTF for 0 degrees elevation and 30 degrees azimuth for three different individuals. As can be 
seen, each of these individuals has quite dififaent HRTFs. Therefore, for each individual, it is critical 
to use a set of HRTFs for filtering audio signals such that when the audio signals are filtered, the user 
perceives the corresponding sounds to be localized in the proper positions, in order to optimally 
create the sensation that the particular signal originates from the location which is intended by the 
HRTF processing. There have been some efforts to use a "universal" set of HRTFs, wherein every 
userispresentcdwiththesamesetofHRTFs.havingsomeaveragccharacteristics. However, as one 
can see from Figure 3B, a "universal" set of HRTFs would give very different sensations to each of 
the three mdividuals depicted. For instance, if an individual's HRTF had a peak (or valley) at a 
frequency f, while the universal HRTF had a contradictory valley (or peak) at the same frequency 
f, the individual would interpret the directional cues of the signal incorrecUy. These inaccurate or 
poorly matched HRTFs degrade the overall 3D perception of the individual, the amount of 
degradation depending on the individual. This was experimentally demonstrated by Wightman and 
Kistler(1993). 



PCT/US97/0014S 



10 



to order to m>prove pcrfonmnce bq,ond U,e use of a sinBle or •■un.ver.al" HRTF and to 
overcome d,e u>,practie.Utfes of measunag an iodiv.dual se. of HRTFs for each mdiv.d^ u.e 
present mvention provdes = database of HRTFs collected from a measured group of the getieral 
populatico. For example, the HRTFs are coUected o™ individuals of both sexes with 
vao™gph^.ca.^,.cs. Thepresentmventton then emp.oysau,U<,ue process w 

sets of HRTFsobtained from aUit^viduals are organized into an orde^d fashion and stored in, 
readonly m^HyOlONOorotherstorage device. An HRTF matd^g processor enables each user 

.oselect, from the sets ofHRTFsstoredmtheROM,. set of HRTFssuch that when audio sign^. 
are processed with the selected set of HRTFs, the user percetves the corresponding sounds to be 
localized m the proper spatial positions. 

An exemplary embodiment of the ptesent ■nvent.on is Uluslrated in F.gure 4 After the 
mulu.cham,=l stgnal has been decoded mn> „s consttaen. cham,els. for example channels . 2 3 
4 and S in the Dolby Pro U^ie® fonnat, selected cham>els are p,«cessed v,a an optional bass bolst' 
ctrcutt 6. For example, chamois .. 2 and 3 are processed by the bass boost circuit 6 Oufl,ut 
channels 9 from the bass boos, circuit 6. as weU as channels 4 and 5, a,^ then each 

electromcaUy processed to create the sensation of a phantom loudspeaker for each channel 

Processing of each chamel is accomplished through digital fUtering using sets of HRTF 
coeeSc^ts. for example via HRTF processing circuits 10, 11. 12. 13 and 14. The HRTFp^^ing 
ctrcmts can include, for example, a suitably programmed digital signal processor. A best match 
betv^en the listens and a set of HRm is selected via dte HRTF matchmg processor 59. Based on 
U.e best match se. of HRTFs. a pn=fe,red pair of HRTFs. one for each ear. is selected for each 
chamiel as a fimctton of the intended loudspeaker position of each chamiel of die multi^hamiel 
signal. 1. an exemplary embodiment of die ptesen, mvention. the best match set of HRTFs an, 
sdeaed from an ordered s« of HRTFs stored in ROM 65 via the HRTF matching processor 59 and 
routed to the appropriate HRTF processor 10, 11. 12. 13 and 14. 

Prior to the listener selecting a best man* set of HRTFs. sets of HRTFs stored u, die HRTF 
databaseaarepmcessed by an HRTF ordering processor 64 such dia. they may be stored in ROM 
65manonfa«ise<pKnce,ooptmnzethem«chingp„xessvu,HRTFmatchmgprocessorS9 Once 
U« optimal pair of HRTFs for each channel have been selected by the Ustener. separate HRTFs are 
apphed for the Hght and left ears, converting each mput cham^l to dual chamiel output 

Eachchannel of thedual channel output from, for example, the HRTF processing cu,:ui, 10 
IS mulapUed by a scaling factor as show,, for example, at nodes 16 and 17. This scahng factor 
reflects signal attenuation as a luncdon of the distance between the phantom loudspeaker and the 
hstener-s ear. All dght ear channels are summed at node 26 All left ear chamiels are summed at 
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node 27. The output of nodes 26 and 27 results in two channels, left and right respectively, each of 
which contains signal information necessary to provide the sensation of left, right, center, and rear 
loudspeakers intended to be created by each channel of the multi-channel signal, but now configured 
to be presented over conventional two transducer headphones. 
5 Additionally, parallel rcverberatiOT processing may optionally be performed on one or more 

channels by reverberation circuit 15. In a fi-ee-field, die sound signal that reaches the ear includes 
information transmitted directly from each sound source as well as information reflected off of 
surfaces such as walls and ceilings. Sound information that is reflected off of surfaces is delayed in 
its arrival at the ear relative to sound that travels directly to the ear. hi order to simulate surface 
10 reflection, at least one channel of the multi-channel signal would be routed to the reverberation 

circuit 15, as shown in Figure 4. 

In an exemplary embodiment of the present invention, one or more channels are routed 
through the revoberation circuit 15. The circuit 15 includes, for example, numerous iowpass comb 
filters in paraUel configuration. This is illustrated in Figure 16. The input channel is routed to 
1 5 Iowpass comb filters 140, 141, 142, 143, 144 and 145. Each of these filters is designed, as is known 

in the art, to introduce the delays associated with reflection off of room surfaces The output of the 
Iowpass comb filters is summed at node 146 and passed through an allpass fUter 147. The output 
of the allpass filter is separated into two channels, left and ri^t. A gain, g, is applied to the left 
channel at node 147. An inverse gain, -g, is applied to die right channel at node 148. The gain g 
20 allows the relative propcMtions of direct and reverberated sounds to be adjusted. 

Figure 17 illustrates an exemplary embodiment of a Iowpass comb filter 140. The input to 
the comb filter is summed widi filtered output from the comb filter at node 150. The summed signal 
is routed through the comb filter 151 where it is delayed D samples. The output of the comb filter 
is routed to node 146, shown in Figure 16, and also summed with feedback fi^om the Iowpass filter 
25 153 loop at node 152. The summed signal is then input to the Iowpass filter 153. The output of the 

Iowpass filter 153 is then routed back through bodi the comb filter and the Iowpass filter, with gains 
applied of g, and gz at nodes 154 and 155, respectively. 

The effects of open-ear (non-obstructed) resonation are optionally added at circuit 29 in 
Figure 4. The ear canal lescxiator according to the present invention is designed to simulate open-ear 
30 listening via headphones by introducing the resonances and anti-resonances that are characteristic 

of open-ear listening. It is generally known in the psychoacoustic art that open-ear Ustening 
introduces certain resonances and anti-resonances into the incoming acoustic signal due to the 
filtering effects of the outer ear. The characteristics of these resonances and anti-resonances are also 
generally known and may be used to construct a generally known transfer fimction, referred to as the 
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open ear, transfer function, that, when convolved with a digital signal, introduces these resonances 
and anti -resonances into the digital signal. 

Open-car resonation circuit 29 compensates for the effects mtroduced by obstruction of the 
outer ear via, for example, headphones. The open ear transfer function is convolved with each 
channel, left and right, using, for example, a digital signal processor. The output of the open-ear 
resonanon circuit 29 is two audio channels 30. 31 that when delivered through headphones, simulate 
the listener's multi-loudspeaker listening experience by creating the sensation of phantom 
loudspeakers throughout the simulated room in accordance with loudspeaker layout provided by 
format of the multi-chamiel signal. Thus, the ear resonation circuit according to the present 
mvention allows for use with any headphone, thereby eliminating a need for uniquely designed 
headphones. 

Sound delivered to the ear via headphones is tjpicaUy reduced in ampUtude m the lower 
frequencies. Low frequency energy may be increased, however, through the use of a bass boost 
system Anexemplaiy embodiment of a bass boost circuit 6 is illustrated in Figure 5. Outputfrom 
selected channels of the muJu-channel system is routed to the bass boost circuit 6. Low frequency 
signal information is extracted by performing a low-pass filter at, for example. 100 Hz on one or 
more channels, via low pass filter 34. Once the low frequency signal infonnation is obtained, it is 
multiplied by predetemiined factor 35, for example k. and added to all chamiels via summing circuits 
38, 39 and 40, thereby boosting the low frequency energy present in each channel. 

To create the sensation of multiple phantom loudspeakers over headphones, the HRTF 
coefTicients associated with the location of each phantom loudspeaker relative to the listener must 
be convolved with each channel. This convolution is accomplished using a digital signal processor 
and may be done in either the time or frequency domains with filter order ranging from 1 6 to 32 Ups. 
Because HRTFs differ for right and left ears, the single channel input to each HRTF processing 
circuit 10, 1 1, 12, 13 and 14 is processed in paraUel by two separate HRTFs, one for the right ear 
and one for the left ear. The result is a dual channel (e.g., right and left ear) output. This process 
is illustrated in Figure 6A. 

Figure 6A illustrates the interaction of HRTF matching processor 59 with, for example, the 
HRTF processing circuit 10. Using the digital signal processor of HRTF processmg circuit 10, the 
signal for each channel of the multi-channel signal is convolved widi two different HRTFs. For 
example. Figure 6A shows the left channel signal 7 being appUed to the left and right HRTF 
processing circuits 43. 44 of the HRTF processing circuit 10. One set of HRTF coefficients 
corresponding to the spatial location of the phantom loudspeaker relative to the left ear is applied 
to signal 7 via left ear HRTF processmg cucuit 43, the other set of HRTF coefficients corresponding 
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to the spatial location of the phantom loudspeaker relative to the right ear and being applied to signal 
7 via the right ear HRTF processing circuit 44. 

The HRTFs appUed by HRTF processing circuits 43, 44 are selected from the set of HRTFs 
that best matches the listener via the HRTF matching processor 59. The output of each circuit 43, 
5 44 IS multipUed by a scaling factor via, for example, nodes 16 and 17, also as shown in Figure 4. 

This scahng factor is used to apply signal attenuation that corresponds to that which would be 
achieved in a free field environment. The value of the scaling factor is inversely related to the 
distance between the phantom loudspeaker and the Ustener's ear. As shown in Figure 4, the right ear 
output is summed for each phantom loudspeaker via node 26, and left ear output is summed for each 
1 0 phantom loudspeaker via node 27. 

Once the left and right channel signals are processed and contain signal information 
necessary to provide the intended multi-channel sensation, the signal can be transmitted to 
conventional two transducer headphones. These signals can be transmitted by wire or wirelessly, 
for example, by a radio frequency (RF) transmission system. Examples of wireless transmission 
15 systems arc exemplified in Examples 2, 3, and 4. 

A central feature of this invention is to provide a sufficienUy diverse and comprehensive set 
of HRTFs so that the user can select from that set one HRTF set which wiU produce the perception 
of sound located in the proper spatial position. This selection process is accomplished herein by: 
(1) coUccting a comprehensive database of HRTFs; (2) ordering the database so that a representative 
20 subset of the entire collection of HRTFs can be obtained and stored in the device; and (3) providing 

a means for a user to select from the representative subset. 

As described earUer, a single HRTF (see Figure 3B) is the spectrum obtained by presenting 
soundfromasinglelocationllO(seeFigure3A). A listener's HRTF (head related transfer function) 
refers to the set^f HRTFs obtained from the multiple locaticHis described, for example, in Figure 3A. 
25 For any source location, two HRTFs are measured, one for the Ustener's left ear and one for the right 

ear. Thus, if L locations arc measured, the set of 2*L spectra represent tiie HRTF set for a single 
Ustener. If S subjects are measured, an entire data base consisting of S*L'2 spectra is generated, 
hi one embodiment, 360 locations (L=360) were measured and HRTFs on over 150 subjects were 
coUected. Thus, the total data base consists of more than 1 08,000 spectra. These, or representative 
30 spectra are chosen (see below), and are stored in a database 63 (see Figures 4 and 6B). 

For coUecting these spectra a special robot ami was constructed. Prior measurement devices 
involvedtheuseofmultiple.e.g., 12, loudspeakers located on a circular hoop. Each of tiie multiple 
loudspeakers were used to create a signal used to measure the head-ear filter characteristics, hi using 
these prior measurement devices, signals from each of the multiple loudspeakers were projected from 
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a dtf^ren. Iccauon u> allow n,=asurcn,ems of HRTFs for differ elevauoos and 
Howeve^.^e of mul-ple loudspeakers poses a problem. To avo,d contanu-e^oo of U.e 
«as.redHRTF,d.difrerenMoudspea.ers„e«...avee,uaiou^^^ U„fo„una.,y 
only possible to equate such spectra to within about 0 5 dB. 

A<^-=^V.«thepres™inve«.on.aniu.proved„,easureo.en.„ethod.pr„v.dedby 
-^.as^gle lot^pealcer located at tbeeodofa,obotarn,.T.es.^^ 

I ZI ""T"^*'' P'"'"- °f --q-al output spec« Of difrereu, 

loud^^ The single loudspeaker is pr^isely positioned byaco^puter^^oU^ robot J 
u..acbofd»,oca..ons>vbe,.an™TFistobe.neasuredTl.presentHRTPn,eas^.eut^^^^ 
can measure and «cord a complete set of 3«0 HRTFs for each ear. tor an individual .„ 
approximately 10 to 15 nUnutes, as con.pa,«l to ooe-to-four hours for prtor measuren^nt 

■ecbnrques. Because the listens should remain stauonarydunng the ent^emeasureme™ 

the speeding-up of .he measur^ent process can. itself, contribute to the accuracy of the' 

measurements. 

PHMdedinFigurclSAisasch^tdcofaprefenedembodunentofanHRTFrneasur^nent 
meansacxcdingtothisinventioa A. 200 there is ptovtdcd a speaker, preferably a 4 Ohm 40„an 
speaks, for example, prodtr^d by Pioneer At 201, there ts provided a lower arm. with dimensions 
approxm^atcly 1" wide, about 2- high and abou, 29" long At 202, there is provided an elbow AC 
sovo motor, preferably capable of high rotational speeds and tor,u« (eg about 20,000 rpn. and 
about 200 oz..i„.). ™l an absolute encode (e g about 50O count/rev.,. Affixed to the elbow AC 
servo motor, there is provided an elbow planet^y gearbox 203. preferably „,.h a ratio of about 
100: 1 and a tcc^ capabihty of about 275 in.- lb An upper ann 2.2 is competed to dte lower ^ 
20. through the elbow AC servo motor 202. A. the upper end of the upper ann 212, there is 
pn>v«.ed.shoulderspurgearpair204,prefer.blyhavu,garatioof about 11.1,11:1. Maintaining 
the shoulder spur gear in appropnate linkage with the upper ann 212 ,s a mouniing bracket with 
bearu^s 205. Tlte m««ing bracket 205 is suspended fiom a n«adon shaft 206 havmg a drameter 
of about 1-1/4". A rotadon spur gear pair 207 is provrded with a ratio of about 12.8: 1, to rotate d« 
routron shaft 206. A roudon planetary gearbox 208, havmg. rado of about 100:1 andator,ue 

capabihty of about 275i.-,b.,dr,ves the rotation sp. gear p^ 207. Arotadon servo motor and 
assc^tated absolute encoder 209 having a speed of about 20,000 rpn, a torque of about 200 o^ - 

U.,w,th,heencod.beingamenable.o500coun.^.,3,e provided to ac«.ted,erou..o„plan^ 
gear^208. Ashould^planetary gearbox 2.0, havmg a ra.oof about 100:1 and a torque output 
2 rr -^-g.speedofat.ut 
20,000 rpmandau=,^ou.p„tof^200oz.-mand an absolute encoder capable of about 500 
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count/rev., are linked to the shoulder spxir gear 204 through a drive shaft 214. A wrist gearmotor 
213 having a speed of about 50 rpm and a torque of about 178 oz. - in. with an associated analog 
encoder are provided to position to the speaker 200. 

In Figure I8B, there is provided a detail of the upper arm 212, the elbow planetary gearbox 

5 203, the elbow AC servo motor and absolute encode 202, the mounting bracket with bearings 205, 

the rotation shaft 206, the shoulder planetary gearbox 210, the shoulder servo motor and absolute 
encoder 211 and the drive shaft 214. 

In Figure 19, there is provided a schematic representaticMi of the HRTF measurement control 
system. This includes a central c<»trol computer 300 which, in a first loop, controls a servo 

10 controller 301 which drives a plurality of soa'O amps 302a-c, which in turn drive a plurality of linked 

encoder, servo motor and gearboxes 303a-c. Encoder/servo motor/gearbox 303a drives rotation, 
while 303b drives the shoulder, and 303c drives the arm (see Figure 18). hi a second loop, the 
central control computer 300 controls data acquisition, signal presentation and speaker control via 
a feedback loc^ comprising: an encoder/gear/motor assembly 304 for positioning the speaker 305; 

15 an A/D converter 306, a D/Acaavcrter 307, and an attenuator 308. The feedback loop links through 

an amplifier 309 to the speaker 305 and to a microphone pre-ampUfier 310 and the left and right 
microphones 311a and 311b. It will be appreciated that the above described hardware, and in 
particular the specifics of the various motor and gear power, rotation rates and ratios are all subject 
to modifications without adversely affecting the general principal of rapid, automated HRTF data 

20 acqtiisition with improved accuracy. 

The above described hardware may be controlled by software which controls the positioning 
of the speaker. A preferred embodimoit of such software is schematically represented in Figure 20. 
As can be seen, the software controls system startup at 400, system initialization 401 , and display 
of a main menu 402. Subroutines 403-408 are provided which allow for loading of daU 403, 

25 speaker calibration 404, heat^hone measurement 405, performance of an HRTF test run 406, 

performance of a full HRTF measurement run 407, and termination of the program 408. A 
schematic of a full HRTF measurement run 407 is shown in steps 407a-407q, all of v^ch are 

; initiated by selecticn of element 407 at the main menu. At 407a the full HRTF measurement run is 

initiated, following which the measured subject is identified 407b, the robot arm is cahbrated 407c, 

30 via a feedback loop 407d which repeats arm calibration until a calibration "OK" signal 407e is 

received. The robot arm is set to a zero starting position 407f, and the measurement routine is begun 
407g. This includes movement of the robot arm and speaker 407 h about the subject whose HRTF 
sets are being measured. The acquired data is played/recorded 407i and the HRTF azimuth and 
elevaticm is displayed 407j on a monitor. A continuous interrupt query 407k is sent and as long as 
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no intcmipt signal is received, the measurement process is looped 4071 back to measurement step 
407g. If an interrupt signal is received, the system resets 407p to the main menu, 407q. If the 
measurement routine is continued without interruption, a complete set of HRTFs are measured until 
the natural termination of the measurement routine is reached 407m. A pause 407n is included m 
the routine to aUow the system to store 407o the acqmred HRTFs, after which the system resets to 
the main menu 407q. 

The headphone measurement 405 comprises steps 405a-405h, which are initiated by 
selectmg this option at the mam menu: at 405a. the routme is imtiated, foUowing which sounds are 
played through the headphone and displayed 405b. A pause 405c is included m the routine to aUow 
time for data retrieval and initiation of a subroutine 405d. If a particular headphone subroutine is 
not to be initiated 405e the system resets to the mam menu. However, if a particular headphone 
subroutine is to be initiated, a particular headphone identity is entered 405f and the data acquired 
for that headphone is stored 405g foUowing which die system resets to the main menu 405h. 

Optimally, the HRTF measurements are made in an appropriately constructed sound room. 
In a preferred embodiment of this invention, the measurements are made in a room such as that 
schematically depicted in Figures 21A, 21B. and 21C. This room, shown in a front view in Figure 
21 A, provides an exhaust fan 500 and an air ouUet channel 510. A latched door 520 is provided, 
preferably with latches on both the inside and outside. A fresh air fan 530 is provided for 
replenishment of fresh air from the outside of the room through an air inlet channel 540. hi Figure 
2 IB, a schematic of a top view of the sound room is provided, including a representation of the 
subject seat 550. a monitoring camera 560, a pair of laser pointers 570, and sound absorbent walls 
580. hi Figure 21C a detail of the wall cross section is provided, showing a double wall structure 
in which there is provided two layers of do' waU 581 between which there is placed a dampmg 
material 582. preferably selected from foam rubber, polyurethane or like sound insulating material. 

A fiirther improvement in the present HRTF measurement device and method is the location 
of the transducer employed to record the sound signal used in calculating the HRTF. Prior 
measurement techniques attempted to measure the sound as close to the eardrum as possible, by 
placing a narrow tube deep into the outer ear canal to measure the HRTF just at the eardrum. 
However, through physical considerations of the nature of sound transmission and the fact that die 
ear canal is small, we conclude that only a plane wave travels in the ear canal below frequencies of 
about 23.000 to 26.000 hertz. Since only plane waves travel in the ear canal at these frequencies, 
we expect that there is no directional information derived from the effect of the ear canal on the 
incommg sound. Since no directional information is derived from propagation of the sound down 
the ear canal, in the present HRTF measurement device and method, the transducer may be placed 
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at the entrance of the outer ear canal, instead of deep into the outer car canal near the eardrum. ]n 
addition to being less uncomfortable for the individual "wearing" the transducer, the external 
location of the transducer provides a much higher S/N ratio than previous locations for the 
transducer. This higher S/N ratio provides a more accurate HRTF, especially in the "valleys" of the 
HRTF where die greatest attenuation of the incoming impulse signal exists. 

The database of measured HRTFs is ordered by comparing the spectra recorded from 
different individuals. This is accomphshed by transforming or pre-processing the raw data to 
represent the peix^ptual feadircs of the raw spectra more accurately. The raw HRTFs are measured 
as the impulse response to a digital signal propagated by a loudspeaker at a given location. The 
signal so generated is carefully measured in the free-field (in the Ustener's absence) to correct for 
imperfections in the spectrum of the loudspeaker. The measured impulse response is then converted 
to the frequency domain using a fast Fourier transform (FFT) according to methods well known in 
the art. This frequency domain representation is further processed by implementing critical-band 
filtering and converting the data from a Unear frequency scale to a logaritiimic scale. Critical-band 
filtering reflects the fact that the first stage of the auditory system contains bandpass filters whose 
bandwidth is a constant fraction of the center frequency of the filter. The critical band filters 
resemble 1/6 octave bandpass fibers, hi addition, the distance along the auditory display is roughly 
proportional to the logarithm of sound frequency. Therefore, a logaridimic, rather than a Unear, 
frequency scale is imposed on the representation. 

hi an exemplary embodiment, a gammatone filter is used to perform critical band filtering. 
The magnitude of the frequency rcspcmse is represented by the fimction; 

g(f)=l/(l+[(f-fc)^/b^l)= 
where fis frequency, fc is the center frequency for the critical band and bis 1.019 ERB. ERB varies 
as a fimction of frequency such that ERB = 24.7[4.37(fc/1000)+ll. For each critical band filter, the 
magnitude of the frequency response is calculated for each frequency, f, and is multipUed by the 
magnitude of the HRTF at the same frequency, f. For each critical band filter, the results of this 
calculation at all frequencies are squared and sununed. The square root is thwi taken. This results 
in one value representing die magnitude of the internal HRTF for each critical band filter. 

The hearing system is sensitive to a fixed fractional change in signal magnitude, which is 
known in the field as "Weber's Law." Thus, if stimulus magnitude is represented on a logaridimic 
scale, such as decibels, the ear is sensitive to a fixed number of decibels, hi sum, the internal 
spectrum is represented by the level of the stimulus in decibels at about 12-18 frequencies per octave 
in the range between 3 and 18 kHz. Outside diis frequency range (3 to 1 8 kHz) die human auditory 
system gains little or no directional or localization information based on the shape of the stimulus 
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spectnun In fact, few listeners but the very young can hear sounds above 1 8,000 Hz. At the lower 
frequencies, the spectrum of the signal ,s essentially the same for any azmiuth or elevation. At the 
lower frequencies, however, especiaUy below 4 kHz. differences m tmie of arrival at the two ears 
(interaural time cues) are important to indicate differences in the azimuthal position of the source 
Such filtering results in a new set of HRTFs, the mtemal HRTF. that contain the information 
necessary for human Ustening. If, for example, the fiinction 20 log.^ is appUed to the center 
firequency of each critical band fUter. the frequency domain representation of the mtemal HRTF 
becomes a log spectrum that more accurately represents the perception of sound by humans 
Additionally, the number of values needed to represent the internal HRTF is reduced from that 
needed to represent the unprocessed HRTF. An exemplary embodmiem of the presem mvention 
applies critical band filtering to the set of HRTFs from each individual in the HRTF database 63 
resultingmanewsetofintemalHRTFs. The process is illustrated in Figure 12. wherein an impulse 
response waveform 80 shown in Figure 1 1 is filtered v.a a critical band filter 81 to produce the 
internal HRTF 82. 

The appUcation of critical band filtering results in, for example, N logarithmic frequency 
bands located in the 3000 Hz to 18.000 Hz range. Associated with each of these N frequencies is 
the level m that band in decibels. In one exemplary embodiment, N=39. the levels are measured with 
a density of about 15 levels per octave. The entire data base, given S subjects and L locations, is 
described by 2*S*L*N values and is iUustrated in Figure 13. This pre-processing summarizes the 
more sahent perceptual feauires of the acoustic filtering produced by the head and external ear when 
a listener hears a sound at a given position in space 

HRTFs obtained from the diflferent subjects and transformed or pre-processed as described 
above can now be compared and organized so that their similarities and differences can be 
quantified. One basic method of companng two or more spectra is the simple EucUdian distance. 
Eudidian distance is equal to the root-mean-squared (RMS) difference in decibels between the levels 
measured at the same frequencies in the two or more spectra. For a collection of HRTFs obtained 
from the right ear of S subjects, we can compare this set by forming a distance matrix having S rows 
and S columns, in which the entry (i, j) is the distance m decibels between the internally represented 
HRTF of the "ith" and "jth" individuals. Naturally, the distance measure is symmetnc, so the entry 
(i. j) is equal to the entry (j, i). and the distance between any individual and themselves is zero, so 
the diagonal entries (i, i), where i=j. are aU zero. It is on the basis of the similarities and differences 
between the processed HRTFs that the database is ordered. 

Having explained how the HRTFs are measured and preprocessed, we can now return to the 
issues raised earUer about how die user of the device selects a particular HRTF from those stored 
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in the device. The selectaon process must ensure that the sound sources appear in their proper spaual 
position for the individual user. Thus, the first issue to be addressed is whether the entire database 
of measured HRTFs is sufBcicnUy broad and comprehensive to represent the entire Ustcmng 
population. In onecxemplaiy embodiment, 150 HRTFs were measured from a population in which 
both genders and a variety of ages and ethnicities were represented. 

Statistical tests of this database suggests that 150 HRTFs constitute a set size sufficient for 
the purposes of the subject invention. These tests were all conducted on a sample consisting of 150 
sets measured according to this invention. Three HRTFs from each HRTF set were selected for these 
comparisons, namely, on the horizon (0 elevation) and at 10, 20, and 30 degrees to the left of straight 
ahead. It is expected that similar conclusions about stabihty would apply for other positions. Each 
of the three HRTFs from each HRTF set consists, for example, of values representing the level of 
the HRTF, at a pluraUty, e.g. 39, of different frequencies. The 39 frequencies are spaced equally, 
on a logarithmic frequency axis, from about 3.000 to about 18,000 Hz. Few listeners (except the 
veryyoung)canhearsoundabove 18,000 Hz. The composite spectra obtained over the 3 positions 
can be regarded as a vector consisting of 1 1 7 levels (dB). 

To investigate the issue of database size, we constructed different sized sets of HRTFs by 
drawing them at random from the original group of 150 HRTFs. Set sizes of 20. 40. 60. 80. 100. 
and 120 HRTFs were constructed For each of these ramiomly constructed sets, a single HRTFs is 
drawn at random and the distance from that individual's HRTF to its nearest neighbor is computed. 
These random coostRK^tions are reputed many times so that the probabiUtyofagive^ 
be estimated. Figure 22A shows a plot of the cumulative probabihty of that distance for the, various 
different set sizes. For example, if the set size is 20. then the RMS distance in decibels to the nearest 
nei^is less than 2 dB for only about 55% of the individual HRTFs. If the set size is increased 
to 40 HRTFs, then more than 70% are within 2 dB. As the set size increase to 60. 80, 1 00. and 120, 
little incremaital advantage is achieved by adding fiirther HRTFs to the database. This analysis 
demonstrates that the basic differences in HRTFs among different individuals is adequately 
representedinadatabasehavingmorethanabout 100 HRTFs. That is to say. with a raw daubase 
containing 100-200 HRTFs there is a veiy high likeUhood that a randomly selected individual would 
find an HRTF sufficiently close to hisAicr own so as to properly spatiaiize sound. 

Another way to approach the issue of stabihty is to compute a significant statistic of the 
dataset and determine how it changes as we vary set size. From the 150 composite spectra, or 
vectors.acentroidHRTFiscomputed TTiecentroid, itself having 117 levels, is obtained by adding 
together, for each of the 1 17 levels, the value representmg the level of the HRTF from each of the 
ISOcomposite spectra anddividingeachsumby the sample size. 1 50 in the example. If each of the 



wo 97/25834 V 

PCT/US97/0014S 

20 

TheEuclideandistancebetweenthecentroidandeachofthe 150cnn, : 

means and U« average of U« ..000 sundard dcviadom. for each subsample group si. were 
« ^■^-^^^'^^.^c.^nS.inU^average^asa.nu^oTjTFs 

^a^,oy.„,a,.as,B=s„.sa„,.g,o^.^,^^„,,,„^„^ J^,^^^^^^ 

sJl:!"^ "'"'""" ^«^-'y-ef=„^..oF.g^e22D.Iaverag 

~^auc^c.a.g«^abou.25%i„va,u=asU.suWp.groups,.go«..oo.5„;o 

^Fs. AS can be sea, «„„bcU.FIg„^22C and 2.D .here is v.^U«=cba.gc„U>e average 
n^canoraver^^ddevia^onfors 

^'^'"°™"'^'««csofU«l50me.suredHRTFs.,ereaso„.b,ysub.eandwe 
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While the preceding has estabhshed that the initial database is sufficiently comprehensive 
to cover an entue population of listeners, it should also be appreciated that not each of the 100-200 
HRTFs contributes equally to that result. This is because there is considerable similarity or 
correlation between certain groups within the entire database. This fact suggests that the raw 
database can be pruned in some fashion to reduce the total number of HRTFs actuaUy stored in the 
device. Several different statistical techniques might be used to provide an organization of the 
database that reveals the underlying correlations. These include one of the variety of 
multidimensional scaling procedures known in the art. The procedure used in one exemplary 
embodiment herein was cluster analysis. Specifically, we used a hierarchical agglomerative 
clustering procedure such as that executed by the statistical program S-Plus™. This procedure uses 
similarities between the HRTFs as measured in a distance matrix of aU 150 HRTFs to produce an 
ordered tree-like structure to the daU. At the highest node of die cluster, aU of the HRTFs are 
contained. Successive nodes contain HRTFs that are similar to each other and different from the 
remainder, just as biological animals are classified as orders, genera, and species. Figure 15 shows 
a sample cluster of HRTFs obtained from four subjects. ImpUcit in this example is the fact that 
HRTFsof the left and right ear of asingle subject arc usudly nearer in distance than are one per^^^ 
HRTF to any other person's HRTF. Clustering provides a convenient ordering of the entire database, 
so that subsets of HRTFs can easily be obtained by selecting simUar groups determined by the nodes 
in the cluster. Those skiUed in the art wiU recognize from this disclosure that other methods of 
ordering known in the art could be used. 

A lepresentative subset of HRTF sets from the entire set of 150 HRTF sets, from which a 
listenercanbematched,ischoscntosiniplifythematchingprocess. In one embodiment, the HRTF 
sets within a representative subset are stored for use according to the method of this invention. The 
greater the number of HRTF sets stored in the device, from which Ustencrs can be matched, the more 
likely the listener wiU be matched to an HRTF set similar to the hstener's own HRTFs. The 
disadvantages of having a veiy large number of HRTF sets stored in the device are that more 
memoiy is required to store the HRTF sets, with an accompanying increase in cost of the device. In 
addition, it would take more time to match the Ustener with the best-match HRTF set. 

In order to balance the competing factors in determining the number of representative HRTF 
sets to include in the device, we computed the mean minimum RMS distance between an HRTF set 
randomly selected from the entire measured database of HRTF sets (e.g., 150 HRTF sets) and the 
representative HRTF set, from the subset of representative HRTF sets chosen to be in the device, 
nearest to the randomly selected HRTF set, as a function of the number of representative HRTF sets 
chosen to be included in the device. Figure 22E shows the results from two different algorithms for 
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ooe Of a. 5 representative sets is sele«ed. U« user select Son. among the five sMar HRTF se« 
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the database that has the most HRTF sets wthin x. e.g., 2, dB of it. In a second step, the process of 
selecting 15 representative sets proceeds by Grst selecting the most popular HRTF set as a 
representative HRTF set. and then eliminating every HRTF set that was within x, e.g., 2, dB of the 
most popular HRTF set from further selection in the database. The next most popular HRTF set, 
which was not eliminated upon the selection of the most popular HRTF set. is then selected to be the 
second representative HRTF set, and every remaining HRTF set in the database within x, e.g., 2, dB 
of this HRTF set is acccffdingly eliminated. This process is repeated, moving down the list of 
popularity of HRTF sets that remain in the database. Once 1 5 representative HRTF sets have been 
selected, the process may be terminated. Naturally, it will be recognized that fewer or more 
representative HRTF sets may be selected and that a stringency. i.e., x, of greater than about 1 dB 
to about 4 dB may be imposed around each of the most popular HRTFs so as to arrive at about 1 5- 
25 representative HRTF sets fiom the entire database of measured HRTF sets. From our statisUcal 
analysis, we have found that 15-25 representaUve HRTF sets is preferred for the considerations 
provided above. 

Once a number of HRTF rtspresenUtive sets have been selected, the user selects the HRTF 
set that he/she will use in Ustening to program material by any of several different methods. One 
procedure is to present, via headphones, sounds fUtered by a variety of HRTFs to convey the 
impression of phantom sounds rotating about the listener's head. The programmed sounds are in fact 
aU chosen from elevations on the horizon. What is generally true of HRTFs is that the variation in 
the filtered spectrum decreases as elevation increases. That is, the HRTF is generally Qatter as the 
elevation of the sound increases. It is also true that a Ustener using an HRTF that is very dissimilar 
to his/her own will tend to hear the phantom sound much higher in elevation than that programmed. 
Thus, when a Ustener hears a sound at a lower elevauon, it generally means that the listener better 
appreciates the structure in those HRTFs. Consequentiy, if one hstens to a set of different HRTFs 
programmed to pixxiuce the circle of phantom sounds on the horizon such as that iUustrated in Figure 
10, the HRTF set producing the lowest apparent elevation will provide the best means to localize 
sound in the correct spatial location. 

Summarizing the foregoing description, the present invention uses HRTF clustering as 
illustrated in Figure 6B. As discussed above, the present invention collects and stores HRTFs from 
numerous individuals in the HRTF database 63. These HRTFs are pre-processed by the HRTF 
ordering processor 64 which includes an HRTF pre-processor 71. an HRTF analyzer 72 and an 
HRTF clustering processor 73. The HRTF pre-processor 71 processes HRTFs so that Uiey more 
closely match the way in which humans perceive sound, as described above and further below. The 
smoothed HRTFs are statistically analyzed, each one to every otiier one, to determine similarities and 
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d.ffere„ces be»^ ^ „r„ 7z, Ba«d o„ .h. .:nuiariU. dilTerences, ^ 

HRTFs arc subj^ted .c a cl^« a,^y,i,, as ,s knov™ in U» an. and as described above be 
pnmed"U>anwea. areprescnudvesc, of HRTFs. by HRTF clustering p™«sso, n.r^^^ to 

a«ai^p.gofHRTFs. T^eHRTFsarc.hcns«,^.a„orte«d manner in .keROM 
65 foruse by a btener. F™„fcseorde«d HRTFs, tt,e lisKncr selects U« s« iha, provide U,c bes, 

.be HRTFn^Ubin, processors,. F™n,^se.o™.ba.bes.n.a.hU.e,istener 

^e^mappn^f^U^iocanonofcachphan^n. speaker are input .o.i„ir^peeUvclog.ca,' 
tiKlt processing circuits 10 to 14 of Figure 4. 

Having provided a general description of the subject invention, (see Figure 4 above) a 

sp^mcen>bodnnea.tbercofUdescnb^togrea,e,de,a,,vvi.b reference to Figures 23 tbro^^ 

Refentag ,o Figure 23A. after measuring HRTF seu fiom a sufficiently large number of 
mdtviduals. 1 50 individuals in this example, and performing clustenng analysis .o select the most 
representative group of HRTF sets. 15 HRTF sets in this example, the listener is matched u, or 
sclectsabest^HRTT set aomd«15„«^,eprese.,«ive HRTF sets. IniU.,,y.U„faT.F sets 
of theoBstrepresentadve group of HRTT sets, including the user selected best^tch set of HRTFs 
are stored in an extenial EEPROM 704 to be accessed during the matching ptocess. 

Once the most representative group of HRTF sets is stored in the external EEPROM 704 
anmputleft601 and right «02 audio signal, typically fem a CDplayer. VCR, laser disk player 
lUce source of audio signal are inpwted to a circuit 600 for pnx«sing of the signals to achieve 
accurate spatialization of the sound transmitted to the user of the headphones. 

TTe circuit 600 may be custom burned into read only memoty on a sihcoa or like chip or 
an off-the-shelf, commen^ly avaUable chip, such as a Motorola DSP 56007 chip may be 
programmed by downloading the appropriate connectors to an electrically erasable programmable 
read only memoty (EEPROM) 710 which reconfigures d« DSP 56007 chip each time dae chip 
"wakes up." Rcfertng to Figure 23B. within the circuit 600, the signals are fat routed to a Dolby 
Prologtc* or hke decoder 603, a weU defined Dolby Uboratories standard known in the ar, The 
Dolby Prologic* decoder 603 provides four output channels, left 604. nght 605. cemer 606 and 
sum>und 607. intended for loudspeak^ l«a,ed to d.e Iron, left 608. front right 609. fiont clnter 

610. a.d„:a,ca.ter6U of telisten^. sec Figure 23C, respectively Before processmg the .^^^ 
oufl^ut channels, such as d« four Dolby Pn>logic* channels, by filtering wth HRTFs, preferably the 
center channel signal 606 ,s preprocessed wtthio an early reflectioo 612 processing ctrcui. to 
smudate early reflecuons that sound waves would encounter in a non-anechoic enviromn«,t The 
output signal of the early teflectionpro^ssmg circuit, the left eariyreflecuon613 and Uierightearly 
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reflection 614 signals, are preferably added 615, 616 to the left channel signal 604 and to the right 
channel signal 605. respectively, yielding early reflection processed left 627 and right channel 628 
signals. 

Referring to Figure 24, one embodiment of this early reflection preprocessing, which is 
intended to provide a sense of direction and spatial cue, comprises delay Up lines 618. 619 with 
variable length filter delays 620, 621 and variable magnitude gains 622, 623 for the left and right 
eariy reflections, respectively. Tlw length ofthe delays 620, 621 and the magnitude of the gains 622. 
623 can be adjusted, according to the simulated early reflections to be imposed on the signals, by. 
for example, ambiance 696. theater 624. hall 625. or club 626 control buttons. Means for achieving 
early reflection processing are known in the art (see U.S. patent No. 5.371 .799. incorporated here 
by reference for this purpose). 

Referring again to Figure 23B, next, within the circuit 600. the multiple chamiels of the 
signal 627, 628. 606. 607 are processed 663 to create the sensation of phantom loudspeakers by 
filtering each chamiel of the signal with a pair of HRTFs, firom the best-match HRTF set. 
corresponding to the intended location for that chamiel. As noted above, before the HRTF filtering 
canoccur the user is matched to a best-match HRTF set. The user is preferably matched to a best- 
nutch HRTF set, from among the most representaUve group of HRTF sets of the total database of 
HRTF sets measured so that when used to process an audio signal the user perecives the 
corresponding sounds to be localized in the proper spatial positions. 

Referring to Figures 28A and 23A, one example of how this matching is accomphshed is 
shownindetail The HRTF, matching process begins by the user pushing 
control button .(Ears control) 629, thus enteringtheHRTF matchin&mode. TTiis places the user in 
match mode 1 630.. hi match mode 1 630, the user may select fi«n one of fivechisters of HRTF sets 
(setsl-^5)intbetestbank. Representative HRTFs from each of the five clusters are copied from t^ 
extanal EEPROM 704. which stores the most representative HRTF sets, into the imemal RAM 631, 
sec Figure 23A, of circuit 600, fortestinfr The testing is accomphshed by presenting the user, upon 
the user pushing a noise control 703 button, with sound signals produced by a white noise process 
632, Figure 28B, with a Unearly decaying envelop 633. The user is first presented with a sound 
processed by an HRTF 640 conesponding to a first predetermined virtual location, e.g. , the front left 
speaker 634. see Figure 28C, and then the user is presented with a sound processed by an HRTF 641 
conesponding to a second predetermined virtual location, e.g., the rear left speaker 635. for each of 
the representative HRTF sets of the five clusters copied to the RAM 631. The user sequentially 
Ustens to each represenutive set by using the HRTF matching control button 636 to step through 
the representative HRTF sets 1-5, and ultimately selects which of the sound signals, each generated 
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u^^s rea, left. In Uns embodtacn, user sel«,s <h= Cearcs. sound s,g„., by pressta, .he OK 
.u:^«r T*.sc,«^s«u.s,^™..„,,^_,„„;„ J^^^^^^^^ 

T^he nex. step ,s f.. a,e HRTF s=« (se« 2 ,-2.5 i„ Fi^e 28A, fro. ,he clus«, 
-.spond^^eoU^sd^edsouud Signal .ob.^.ed,^fi™^ 

Oncca.au...„^^ 

byfl.HRTT640oon.spond«g.o the from left speaker634 and ften processed by the HRTF 641 
-espond^gtothetearteftspeaterMS. for each of the five HRTF se,s2.-2.5v.^ 
co^espon^g „ .he previously selected ™,.ive set (sc. 2 in Figure 2»A, The user then 

s^„lt^of*eso^signals.e«hass<^.a.o.ofthe«.TF.^ 
2SA, of d» selected cluster. (2), which the user p=™,ves as n««t clearly arriv.g firs. ^ 
ho^n to the uses fro., left and th^, fromthe hori^ to .he use.s .ear lea Again, u, th.s 
«^ denser selec. this sound signal by p.essingd.e OK bunon^r Upon pre^^ 

2rTr •^"'"'•'■- — -"-Fi^ 

-i«A, and the user leaves match mode. 

1. one embodnnen, d. majodty of program n.a««I produced by a Dolby P^logic® 

de<»der.cc^n,.hefiomspeatek«anon(k«ado„«IOorF,gu,,23C).TI.us,^ 

rfil^'Z^''^''^"""'°~'""'-'*"'^- = *-»'^--'-P'^^ 
«32. filtered by an HRTF appropriate for U,e,™.alposiUonPifto. such HRTFsareusei each 
^«..hesetofHRTFs.ssccia.edwid.U» -.represenuUve individuals chosen ^ d. 
»»epopu.at.onofl50HRTPs.TT«users.ec«^„RTF»bichproducesd,ec.c^^ 
ofaphantcm sound source located d^ecUy in ih.. of U^ltstenerTlus can enable the n^tlg 
^.oprovidean.a.chbasedond«needsofd«appUcado.l.sbou,dbeapprec..edtha.„J 
^a.aybemo.appropriateinotherappl.caUons,bu.th.ss.n.p,e,es..sad=,uateforU.cu,rcn. 
apphcauon. Forexan,p,e.if U.eappUcat.on re,uuessp.t,alizatio„ of sounds to d«s.des HRTFs 
con^poudmg u> Ibc sides can be used in d,e matching process 

'"°-"^°f'k-i"v^aseatcont™ibut.on643.p,ov.ded„MchaUows^ 
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..appear fion.d.left«4«^Hgh.«5fic.ph.„.<.spe.ers„iU be generate, fion-an,^^^^ 
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set (2.2.4 in Figure 28A) measured from an appropriate azimuth angle, i.e., 40 degrees azimuth left 
or right respectively. In addition, for the front-of-the-room scat position 644. the front left 634, front 
center 646, and front right 645 virtual speakers will be louder than the rear virtual speakers. In 
contrast, if a rear-of-the-room seat position 647 is chosen, the front left 634 and right 645 virtual 
speakers will be generated by an HRTF set (2.2. 1 in Figure 28A) measured from a smaller azimuth 
angle. i.e., 10 degrees azimuth left or right respectively. Additionally, for the rear-of-the-room seat 
position 647, the ftrat left 634. front center 646, and front right 645 virtual speakers will be softer, 
than the rear left (suaound left) 635 and rear right (surround right) 64& speakers. 

Once the user has selected a seat position by pushing a seat control button 643, 1 0 HRTFs 
651-660, corresponding to the selected seat position and the best-match HRTF set, are copied from 
the external EEPROM 704 to the internal RAM 631 for use as ^gital filters. The 10 HRTFs 
correspond to the fiont left, front center, front right, rear left (surround left), and rear right (surround 
right) virtual speaker locations, widi a left and right HRTF for each position 651, 652, 653. 654, 
655,656,657,658.659.660. TTiese 10 HRTF sets (651 through 660). from the best-match HRTF 
set (2.2). provide the user with a best-match to the user's own head and pimiae filtering 
characteristics and simulate the user's selected seat position. Note that for each of the 4 seat 
positions 644, 661, 662, 647. 10 different HRTFs are copied to the RAM 631 

Referring to Figure 25. once the 10 HRTFs (651 through 660) are in the internal RAM 631 
and available for filtering of the signal, the four standard Dolby Prologic® outputs after early 
reflectio«preproccssmg.,6Z7,6M,606.607,.«^ 

embodiment of the present invention, a fifth channel (second surround chamiel) 664 may be 
generated by optional^ inverting 665 the single Dolby Prolpgic® surroundxhamiel 607. This 
inversion665 aids mdecorrelating the two surround chamiels. . These tw^ 
664 then become rear left (surround left) 607 and rear right (surround right) 664 chamiels. 
Accordingly, the surround right channel 664 is identical to the surround left 607 chamiel, although 
possibly invented Each of the five chamiels (left front 627, center front 606. right front 628, left 
rear 607, and right rear 664): is then spUt into a right and left channel for filtering by the 
corresponding HRTFs (651-660) stored in the RAM 631. 

Referring to Figure 23 A. to prevent loss of HRTFs and other operating mode parameters 
selected by the user at power-down and power-up, an EEPROM 710 stores aU current parameters 
of the system including cmeot HRTFs, and its stored data is not disturbed by power-up/power-down 
events. This EEPROM can save, after selection by user, multiple operating mode parameter presets, 
which can be pulled up by a user by, for example, pushing a button. 
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sunned «, .0 ^era. a s™.^ rt^. ^ 
™.Sna.can^se™di^.as«of^p^„«fervi™.spea.e,se«.^^^^ 

no. have d» same "Mlncss" of sound as if *e user were « an echo, chamber 

R^femng u, Figure 23B. « euhauce *e "fiaiuess" of 4e sound experienced by 0. user 
bass boosi «70 and reverberaUon 671 p.«essing is p^ferably perfonned on .he signals befo.' 
P-«»«ou.oU.us.over,«.dphon=s. Tl^ are kno™ p^cesses in U« ar.. hpanicuiar 

^pn»essingb^xlc67«. Re<eni„g .o Hgur. 27. circui. 670 con.pnses. for example a lOO 
Hzio«passm.er*72.673foreach signal, left «8andrigb.*«,,u> produce signa.6,. and 682 
fol^w«ibyan.mpUficadan674.675„fgainG,foreachsi^al.left.ndrigh..Thcgan,G.can 
beadjus.^ per d„ user's p,efe«.ce.upordo™,u,adius.U.e amount of bass boos.«*e Signals 
by us.gd«=bassco«,olbunon680.TleH,676andrigh.*77 output of d.e respective amphflers 
are added .o ^e ^specuve left 668 or ngh. 669 .pu. s.gnal u> produce a left bass boos.cd 

o^pu^Sa^larigh. bass boosed oufl,u.679sign^.TT.e left bass boos.edou.pu.678 and righ. 
bassbc«s.edou^67,signalsa.essa«iallyd«origina.signal668.669^.hanaddedcompone« 
can.p™mgG,dmesd.„.ou*u. 681.682 of U«s,gnalUu.oughalOO Hz 10^ 
672, 673, dius boosting da bass compoirail of Ac signals. 

«^'-ngure23B,d.eleftbassboos«d678andriglub.ssboosu=d679ou.pu,signals 
«Wed .o d« OU.PU. Of a reverber«ion pnKessn,g circui. 671. „he« d,e inpu« 604 605 
«06. 607 u-U^reveHxradon processing block are d«,o„gi^fc^,^,n„,,^^ 
o.«pu«befo.anyad.erprccessing. Tl.revertoationp„«ssing67., u,eo.,^„„^^^^,^ 
renecon processmg 6,2, prides U,e "fill" or arch.«ctirral cnhancemen. .ha. an anechoic 
^..s^ation lad. ,^e,ru.g.„P,^36,U. reverberation processus 

*po.econ*Bl.ers683.684.inp.raUel.U>esummedou.pu. Of „hicb692 feeds m^Iall-pass 
..e.68S.686„p.,.llel -^^^^ 
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together and the sum 688 is then inputted to the first comb filter 683 and to the second comb filter 
684. Each aU-pole comb filter 683, 684, as shown in Figure 26, loops the input signal upon itself 
over and over again with the volume reduced by some fractional amount for each successive loop. 
The looping has an associated time delay, t = [k] 690, and gain, Gc 691 . which can be adjusted to 
5 suit the user, and are adjusted by the user choosing among a theater 624, haU 625, or club 626 

setting, with each setting having a unique pairing of length of time delay, t = [k] 690. and magnitude 
of fracticmal gain, Gc691. TTie summed output 692 of the two comb filters in paraUcl feeds two all- 
pass filters 685, 686 in parallel. These aU-pass fUters provide a smearing effect in time to the signal 
at its input without disturbing the frequency characteristics of the input. The all-pass filters are non- 
10 iinearphasedistortersandremovesomeofthephaseinformationasafimctionoffrequen^ This 
allows decorrelation of the left 693 and right 694 reverbcraUon outputs, even though the input 692 
to the left and right all-pass filters is the same, without disturbing the frequency profile which is 
embedded in the signal from the HRTF processing. The level of the left 693 and right 694 
reverberation outputs is a function of gain, G, 695, which is controUed by the ambiance control 

15 button 696. 

Referring to Figure 23B, the left 693 and right 694 reverberauon outputs are summed 697, 
698 with the left 678 and right 679 bass boost outputs, respectively. These summed left 701 and 
right 702 signals are the left audio out 701 and right audio out 702 signals respectively. The left 
audio out 701 and right audio out 702 can be sent directly to a set of headphones to provide the 
20 Ustener with the sensation that the audio is originating from virtual speakers positioned according 

to the seat control selection made by the user. In one embodiment, die headphones are comiected via 
wire to outputs 701 and 702. hi another embodiment, 701 and 702 are signals sent via wireless 
connection to a set of headphones (sec Examples 2. 3, and 4). 

Based on the foregoing disclosure, those skiUed in the art will appreciate that the method 
25 of selecting thebestmatchset of HRTFs framasufBdentiy large database of measured HRTFs may 

be varied considerably, vnthout departing from the principles of this inventioa Accordingly, with 
refoence to Figure 29A, by analogy to Figure 28A. with primed reference numerals in Figures 29A 
and 29B relating to like elements in Figures 28A and 288, it wUl be appreciated that a representative 
set of 15 HRTFs (sets 1-15) may be stored in the test bank. The 15 representative HRTFs used are 
30 predicted to accommodate roughly 95% of the population, with respect to variations m the spectral 

properties of their impulse responses. Again, by analogy to Figure 28A and the foregoing 
description, tbs HRTFs are copied, one at a time, from the external EEPROM into the internal RAM 
of the DSP chip for testing. The user may test these HRTFs by asserting a test signal, see Figure 
29B, which wiU be comprehended by analogy to Figure 28B. A white noise process with a linearly 
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user, (b, .he so™, source is locaUzed a. ^ (i.e. o. a bor,.o«ai p,a.e *fi„ed by „e 

u. ..p^e, 0»ce.Ke.e,,.3s,a=„.^edase.or„RT.sU,a.sa«sfiesU.esee.«„.(.e^l^ 
^.eo.da.es.„..h™TPs«,,*e.e.e..s„„..„^e.^,^.^^^^,,„„^^<^^'^ 

HRTF processor .o locals u.e v^a. sound sou^s. U.s sce^Ho. U» user ,s sparl a. 
■nter^cdrale s.ep of HRTF mau*i„g used i« U,e system sho.™ in F,gure 28A 

From d,c foregoing disclosu^, *ose sloHed in *e ar. wi„ also recogmze U,a. in an 

a^ai,ve^^en,raU.erU^m..eKmgau^.areprescn«.vese.ofTOTFs„he.m^ 
HRTFs u^ „ process an audio si^ai. for each spaUai pos.Uon. i, measured from d« same 

««imduai^„sercanu.uadbema«hed»separa«rep,esen,aUvese«ofHRTFsforeachspaU. 
PO».».T^„savvo„,dpe,fon„ama.chi„gs.epforeachspa,ia.,ocaUon,whereinasubse,„f^^^ 
represemaiive se., seiec,^ f„ ^ desired spada, posiUon, would be used .o process U.e audio 



'"-^-8*'^ffHRTFs,U«Us«™rwou>dexpc.ienceasoundsourcea.each■oca.on 
Tl^soundsc^n^aycl^g^r^eachlocadond^ending on U» objective cH<aion.,U..loc.d^^ 
Forexampl^ .he sounds^^rcemayt^speechforalocadon in which speech is .hemaininfonnadon 
U, be presented. Ar.od«r may be m^redwhrte noise forU«se<c.atio„sd«.wm present ambien, 



In sel«=dng tfrese HRTFs for each locaUon. a Usuner would be aUowed to choose across 

Th. allows .he hstener ,0 cusu^ develop a 'Ws se, of HRTFs" d«, bes. describe his^er 
localr^uon and perception characteristics a, each location to be presemed. F^ennore an 
mterpolauon algorid^. could generate intermediate locations for the uscr^s set of HRTFs Is a 
mixture of the selected HRTF sets. 

. ^™ -^^'cations of these selecuon schemes will be obvious to those 
skiUedm the art based on this disclosure. 

Example I 

In a specific embodiment, .he statistical analysis of HRTFs petfonned by 0^ HRTF 

s'^^^"'t''-''^"^'^°™*'^°""^~--«*-alues. 
Such comp„.ations are teown. for example, using th. MATLAB* software pro^am by The 
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MathWorks, Inc. An exemplary embodiment compares HRTFs by computing eigenvectors and 
eigenvalues for the set of 2S HRTFs at L ♦ N levels. Each subject-ear HRTF set may be described 
by one more eigenvalues. Only those eigenvalues computed from eigenvectors that contribute to 
a large portion of the shared variance ar« used to describe a set of subject-ear HRTFs. Each subject- 
ear HRTF may be described by, for example, a set of 10 eigenvalues. 

In this embodiment, the cluster analysis procedure performed by the HRTF clustering 
processor 73, shown in Figure 6B, is performed using a hierarchical agglomcrative cluster technique, 
for example the S-Plus® program, provided by MathSoft, Inc., based on the distance between each 
set of HRTFs in multi-dimension space. Each subject-car HRTF set is represented in mulu- 
dimensional space in terms of eigenvalues. Thus, if 10 eigenvalues are used, each subject-ear HRTF 
would be represented at a specific location in 10-dimensionai space. Distances between each 
subject-ear position arc used by the cluster analysis in order to organize the subject-ear sets of 
HRTFs into hierarchical groups. Hierarchical agglomcrative clustering in two dimensions is 
illustrated in Figure 14. Figure 15 depicts the same clustering procedure usmg a binaiy tree 
structure. 

This embodiment stores sets of HRTFs in an ordered fashion in the ROM 65 based on the 
result of the cluster analysis. According to the clustering approach to HRTF matching, the present 
invention employs an HRTF matching processor 59 in order to aUow the user to select the set of 
HRTFs that best match the user. In an exemplaiy embodiment, an HRTF binary tree structure is 
used to match an individual listener to the best set of HRTFs. As iUustxated in Figure 15, at the 
highest level 48, the sets of HRTFs stored in the ROM 65 comprise one large cluster. At the next 
highest level 49, 50. the sets of HRTFs are grouped based on simUarity into two sub-clusters. The 
listener is presented with sounds filtered using representative sets of HRTFs from each of two sub- 
clusters 49, 50. For each set of HRTFs, the listener hears sounds filtered using specific HRTFs 
associated with a constant low elevation and varying azimuths surrounding the head. The listener 
indicates which set of HRTFs appears to be originating at the lowest elevation. This becomes the 
current "best match set of HRTFs." The cluster in which this set of HRTFs is located becomes the 
current "best match cluster." 

The "best match cluster" in turn includes two sub-clusters, 51 . 52. The Ustencr is again 
preseotedwithaxepresentativepair of sets of HRTFs from each sub-clustcr. Once again, the set of 
HRTFs that is perceived to be of the lowest elevation is selected as the current "best match set of 
HRTFs" and the cluster in which it is found becomes the current "best match cluster." The process 
continues in this fashion with each successive cluster containing fewer and fewer sets of HRTFs. 
Eventually the process results in one of two conditions: (I) two groups containing sets of HRTFs 
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so sMar U.« tore are no stttisUca. signiiican. differences >vidun each group- or (2, „vo „ 
"MlaningonlyonesetofHRTF. n,. eroup, or (2) two groups 

^V^J^ ''""""ofHRTFs^ ^"^'■«"^''««tofHRTFsselecu=d alius level becomes 
*e^.a^..^^3,^^„^„ F^*, se,of HRTFs.specmc HRTFs ares^ 
as a iuncuon of U.e desued pha„.on, loudspeaker locaUon associated ^d, each of Jl^t 
c^^HRTFsa.r^.„,dp.™TFprocesso„forco„vo,u.o„..e!^c:2 



Example 2 



Refemngu>F.g,ue7,tft70.a„dnah.702a„d.o«,.s,gnaisofFigure23A(or3.and3. 
ofP.gu.4Xcanteu.pu.,fore,^p|e,S4.ofa.,p,ca,di,Msign..an^^^^ 

u..hean.U.ou.pu.ofwhich,fore.aa,p,e762,ca„beu,p„„ed,oase.ofheadpbol 

di^ta. ™r °"' "^'^ " - ' - outputted u, 

^...oranalogf^^^ 

^.hap.f=„^en,bodune.of,hisun«i<^a«erco„ve,sio„„digiu,fonna,*^ 

ngh. CW. .nfonnauon. For example. Che smgle .ourlaced distal signal 755 can have a firs. 

d^word.egl6hi.s.d.....gh.audiochan^ 

2-'»ora.ndU^era,.en.«nghe.weenngH..ndleft^ 

s-gnal 755 can^g ho* d» left and righ. audio channel info^adon can Uien be u.pu.«d, for 
exan,ple755ofFigu,e7,.oa.ypiealdigiWsignal,ransmissio„^,en, 

A sundart digital signal transnussion systetn, as sho™ in Figure 7, .^.c^ comprises a 

u^snu.UngsuUon75,,aconnecting,nediumcaUedachannel752,andarece.vmgs.atil^ 
IVtra^ntt^ng^™ ,5. can recetve a„ analog signal 754 and conven i. .„ . digiL atgnal 7^ 
orca„,ece.v.adigi..,signa.755dir.U. Conve„.o. of an analog to a digital sig^ for Inp 

~.od.e nearest ofanmnber Of discrete Signal levels. The disc^tesig^ 
^s.gna..senMoaso„rceencoder757.vHereeachdiscretes.gn.,evel.conv^ 

iZtoTodT^^r'^"""""""'"-^"™''-'-^''^''"'"-'-^^^^ 

.««.oan,odula«,r 758, Which moduUtesthesigna, for ^ansmission over UtecWel 752 For 
-^..hemod„lator758ca„be.RFmod.^tor.for„hichtheco,respondingchannel„r^^^^ 
... Al.>-vel..,hecha.^ln^^._„,^„3^^_„_^ T,=e J=ivu.g J^^^^ 
■sessenu.,, the inverse Of d».ra„»„..„,g^,^^,_^,^^^^^_^^^;-°^ 
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decoder 760, aiKl an optioaal cDgitai-u^analog converter 761 The output from the receiving station 
can accordingly be either an analog output 762 or a digital output 763. 



)le3 



Important parameters and design considerations for a digital signal transmission system are 
bandwidth of the channel, costs of the transmitting and receiving stations, power consumpuon of the 
transmitting and receiving stations, and the particular binary waveform chosen for source encoding. 
Bandwidth is important because it limits the amount of information that can be sent per unit time. 
The selection of the binary waveform is important because the selection can affect bandwidth and 
the costs, complexity, and power consumption of die transmitting and receiving stations. This 
example provides a method for signal transmission that avoids certain problems, discussed below, 
irfvcrent in known transmission systems for digital signals which enhances the fidelity of the HRTF 
processed signal of this invention as it is sent to a listener. 

Where a receiver, for example, within the receiving station of Example 2, has no clock which 
is, a priori, synchromzed to an incoming digital bit stream, the digital bit stream is called an 
asynchronous signal. Whm an asynchronous binary fomiat digital bit stream is received, the receiver 
must, therefore, lock-on to the bit rate in order to generate a clock signal, tied to the bit rate, to 
enable the receiver to decode the signal. Locking-on to the bit rate can be accompUshed by known 
methods, for example, using a phase-locked loop (PLL). However, there can be difEculUes m 
locking on to the bit rate when receiving digital audio signals represented in bmary format, (e.g., 
t^vo's complement), which are oftendominated by repeated strbgs of conti^^^ 
For example, these strings of contiguous zeroes and/or ones can be encountered with audio signals 
during moments of sUence, or idle patterns. These strings of contiguous zeroes and ones can lead 
to drifting of the output frequency of the PLL due to an imbalance in die charging and discharging 
events within the PLL. When the output frequency of the PLL drifts, the PLL can lose its lock, 
resulting in decoding errors, and thus degradation in the performance of the entire transmission 
system, hi contrast, a binary format digital signal without repeated strings of contiguous zeroes 
and/or ones would give the PLL a balance of charging and discharging events, aUowing the PLL to 
track the digital signal's frequency mwe accurately. 

Existing solutions for eliminating the drifting of the PLL's lock-in frequency due to repeated 
strings of contiguous zeroes and/or ones have required additional bandwidth or compUcated, 
expensive hardware. For example. Manchester, or bi-phase-level encoding, commonly used for 
digital audio signals, eliminates the drifting of the PLL. A Manchester encoded waveform transmits 
the symbol I as a positive pulse for half of the symbol interval. foUowed by a negative pulse for the 
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polarity, therefore, usu,g Manchester encodmg. even with bmary fonnat dig,^ ■ 
rq^stringsofzcroesand/oronesreceivercJockti^np. "^^''^ ^'^^B 

pr r ^ ^ extracted without driftine of the 

PLLbyprovidingacharginganddischargmgcventforthePLLinth.f . ■ ^"^"^^ 

~rW. con-bina^cn for d.^.ai si^s, bu. no. ,i„,.ed aigica, Lo 

au.0 .3^. „^ , ^ ^ ^ J 

«^«.«vo.co^Uo«„,Thesubj««b„i,„^.o,v«<a).™o™^ 

Of no . . a. .p. ^^^^ ^ ^^^^ J 

J._s between O^^a™, ones.au., even dnHngid,epa«^;(b,i„v^^ 
^^o^b..or.bebina.,.e«Msi^,.p„,.e.„ffic.en.„an.l.onsb™^ 

!^,eTll rr^'"'''^""""""'^''"'^'^'"'''^'^ 
2c,e^,a<ockn,sbU„nU.ea.^.a,s,^a,fo.exan,p.eone,cc..ingb.,a..be 
««enab.e.U,e.^.ver..oc..„„„u„_,p,„„„,^^^^^^^^^ 

.hepo..Uon of U.ed.«,u,„o.<.^U^a.digiu.bi.s«.n..,n.«,on.o having UmeoTn^Dc' 
-^*es.>a,.l.u,.bavee„on,H.,f.„o.e.en..eMuen^ 

oegaav=va..«ofU«.g„a..No.,.fa.,gna.doeano.baves.mden.self.no.s=,a„o.seT^^^ 
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The subject encoding technique operates on an input binary encoded digital signal, typically 
encoded in two's complancnL The first step of the subject technique is to remove the DC component 
of the input binary encoded digital signal, if present. Since the DC component of the signal is 
removed, this technique is appUed to signals where DC coupling is not critical, as in the audio signals 
of this invention. Since the human ear cannot detect DC sounds, the DC component is not important 
with respect to digital audio signals. Therefore, this technique is particularly advantageous with 
respect to processing digital audio signals. 

With reference to Figure 8 A, the left 701 and right 702 audio out signals (or 30 and 31 of 
Figure 4) can be outputted in digital or analog format. If outputted in analog fonnat. each signal can 
be converted to digital format 901. In a preferred embodiment of this invention, after conversion to 
digital format, the left and right audio signals are interlaced in time to create a single digital signal 
901 which carries both the left and right channel information. For example, the single interlaced 
digital signal 901 can have a first digital word, e.g., 16 bits, that is a right audio channel word, a 
second digital word that is a left audio channel word and thereafter alternating between right and left 
(see Figure 9G). This single digital signal 901 carrying both the left and right audio channel 
information can then be inputted as shown in Figure 8A. 

It is preferred that the DC be removed 902 from the signal after the signal is in digital form 
901, rather than ftom the analog signal prior to digitization. When one attempts to remove the DC 
component of an analog signal before digitization, a smaU DC component is typically introduced into 
the digital signal during conversion from analog to digital. This DC component introduced into the 
digital signal is inherent in known analog-to^gital converters and even though smaU, is undesirable 
when implementing the subject invention. For instance, during idle patterns of the signal, this 
residual DC componem can cause bit locations to "stick" (i.e. remain in a zero state or a one state) 
for long periods. Hus "stiddng" can make it possible for the receiver to mistake a "sticking" bit as 
a locking bit, which as discussed in greater detail below, is a bit which can be encoded on the digital 
signal and, typically, is always a zero or always a one. 

Removing the DC component 902 can be accomplished by many known techniques, for 
example, by passing the signal through a high pass digital filter. This high-pass filter can be, for 
example, an infinite impulse response (UK) high pass digital fUter. It is important, when designing 
the apparatus which is to remove the DC component from the digital signal, that the apparatus does 
not detrimentally af&ct the non-DC components of the digital signal. In a specific embodiment, a 
first-order Butterworth digital high-pass filter, with a 20 Hz comer frequency, is used, hx a preferred 
embodiment, an adaptive filter is used to remove the DC component. 
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In a preferred mbodm™.. an adapuvc filw =uch as diat sho™ in Figure 8B ,s us«l >o 
remove ^ DC co„,pcaen. 902 of a.= inpur binary encoded digiral Mgnal 901, generated by 
■nrerlacing in time .h. digiul fomra, represenration of lefl 701 and right 702 audio ou. signals of 

F.gu.e23A(orleft3.andright3, earphone Signals 0fFigu,e4).F„clari.y„e can define .he left 
channel words «i,hin 901 as 90.1 and d>e righ, ch««,el words as 9«,r. Tl. i„p« binary encoded 
drgnal Signal, in a specific onbodimen, can be a 1 6 bi. word signal where left and righ, channel 
wo^ are inrerlocked in toe such a» firs. 16 bi, word represent Ure firs, righ. channel word 
andlt^seo^ 1 6 bi. word represents .he firs, left channel word Accordingly, each successive 16 
b,. word alternates benveen righ. channel and left channel, h Uus case, when ,enK,ving U« DC 
"mpon=« 902, i. is reqdred u> separately remove d« DC fion, ttre righ. channel 901r and d« left 
channel90II,due«,dKiadepcndenceofU,erigh.channelandleftchannel signals. Therefore a,e 
ngb. 9«lr ^ left 90.1 channels are spUt apart to be operated on independently for renrova, of U,e 
DC component 902. 

For clariry of discussion. dB processing of Uie left channel 9011 will be explained, noting 
to d. right channel 901r undergoes *e same processing independenUy Referring to Figure 8B 
the drgiuU word of U>e inpu. signal 9011 is firs, sunnned 771 witt, a backing consten. Cft) 772' 
whrch can initiaUy be rero. The sun, 773. which U also dte output of the adaptive filter tiren is' 
compared .0 zero 774. for example, by observing U,e sign bi. of U,e word If the woni is less d» 
zero 775, die Backing constant CM 772 is increased by a step size Q, 77«, C(k+I) - CPcM 
Alternatively, if the word is gre«er than zero 777, the Backing consun. qk) 772 is decreased C 
a Step size Q, 778, C(k.|J.C(kJ . Q, The tracking control variables, Q, and Q„ are dependent 
upon the amountofgain desired m the adapuUon control cin:uit. This adaptive filter effectively 
mtegrates ou, an average, or DC component, and continually temoves it ftom the source signal 

When d» inpu, signal 9011 „ 901 r has sulDcien, self-noise .0 ensu« Wnsitions between 
positive and negative values even after dte DC componem is removed. Uten i. is prefened tita. Q 
and Q, be equal in size. In addition, referring to Figure 8A. if U« input signal 9011 or 901r does not 

have su£Bcien.self.noise,ano.seg=™rator924canbe used to add in sufficient noise, toapreferred 
^nbodanent. if the inpu. signal 901 1 or 90, r does no. have sufBcien. self-noise, ,he adaptive filter 

ofFr^gBcanbeusedtobodrr^etheDCctxr^arxiaddmsumcien. noise, for example 
byhavmgQ, =2Q,. I" "Ws embodim^tt, an mput signal 9011 or 901 r having a DC component of 
«ro, wrth no nou^e, wouM first be increased by Q, .0 a value of Uten would be decreased by Q 
- 2Q, ti> a value of -Q. U«„ be increased by Q, .0 a value of zero, and thus repeat through these 

values Th.^«--esti,at each bit location mtdergoes transitions benveen tfte zero and one srates 
even during idle patterns. 
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Referring to Figure 9A, 9B, and 9C, the results of a computer simulation of removing the 
DC component from a gaussian noise source using an adaptive filter, as shovra in Figure 8B, are 
illustrated In this simulation, a gaussian noise source with a variance of 2.5 mV and a mean of 0.5V 
is introduced to the adaptive filter. For this simulation, a value for both Q, and Qj of 0.488 mV is 
5 used. Figure 9A shows the original gaussian noise source waveform. Figure 9B shows the value of 

the tracking constant, C[kl, and Figure 9C shows the output waveform of the adaptive filter. These 
plots are over 2048 samples or about 52 msec. The output waveform clearly has the DC component 
removed in the latter half of the plot. 

Referring to Figures 9D and 9E, the magnitude frequency response of the input gaussian 
10 noise waveform and DC shifted output waveform are shown, where Figure 9D is up to 2x10" Hz 

while Figure 9E shows an expanded view up to 1 000 Hz. 

Once the DC component has been removed, the next step is to toggle every other bit 903 
of the signal. This toggling can be accomplished by known means, for example, by exclusive ORing 
the signal with a sequence of alternating ones and zeroes, i.e., ...1010...10... The output of an 
15 exclusive OR gate is a one if, and only if, only one of the two inputs is a one. Therefore, when an 

input is exclusive ORed with a zero, the output is the same as the input. However, when an input 
is exclusive OEled with a one, the output is an inversion of the input. For example, a one exclusive 
ORed with a one gives an output of zero and a zero exclusive ORed with a one gives an output of 
one. Refiaiing to Figure 8A, in a specific 1 6 bit embodiment, every other bit of the encoded signal 
20 isinvertedbyexclusiveORing903eachwordatthesignalwith lOlOlOlOlOlOlOlO. Itshouldbe 

noted that one could alternatively exclusive OR the signal with OlOlOl .01 and adjust the receiver 
accordingly. The purpose of this togghng, or inverting of every other bit, is to provide sufficient 
transitions between adjacent bits to enable a receiver to lock-on to the bit rate. In combinaUon, the 
removal of the DC component, and subsequent inverting of every other bit, ensures that there will 
25 not be repeated strings of contiguous <Mes or zeroes, and that each bit location is guaranteed to 

alternate, or flip flop, between the one and zero states, even during idle patterns of the signal. 

To illustrate, in a specific embodiment, 24 bit signed two's complement encoding is used. 
The most significant bit location is the sign bit in the two's complement binary format, where the 
sign bit is zero for positive and one for negative signal values. Since the DC componem of the 
30 digital signal has been removed, the digital signal frequently transitions between positive and 

negative. Therefore, the sign bit location is equally Ukely to be a one or a zero. Combining the 
removal of the IX component with the inversion of every other bit ensures each of the remaining 23 
bit locations in this 24 bit illustration are also just as likely to be a one or a zero, and there are no 
repeated strings of contiguous ones or zeroes rem ainin g in the signal. 



PCT/US97/00I45 



By contrast, even when the DC component is removed, if every other bit were not inverted, 
the 24 bit signal would frequently have positive value words having a string of zeroes in the most 
significant bits during idle patterns, such as 000000000000000000 100 101, with onJy the least 
significant bits hang in adiflfeient state than their neighbor bits. Likewise, there would also be many 
negative value words, wath a string of ones in the most significant bits such as 
lllllllllllUlUlOlOlllO.againwithonlytheleastsignificantbits flip-flopping. Ifthe signal 
for example due to noise, were such that the signal remains positive or negative for relatively long 
periods, dien these most significant bits can "stick" at a particular value, zero or one, for an equally 
lc»g period These "stiddng" bits could be mistaken for a locking bit, wherem a locking bit is a bit 
which can be encoded on the digital signal and, typically, is always a one or always a zero. A locking 
bit can be located at a certain bit locaticHi within a word to allow a receiver to lock-on to the location 
of the words within the signal by locking on to the locking bit. However, according to the subject 
invention, after exclusive ORing the signal with 1010...10, 000000000000000000100101 is 
converted to lOlOlOlOIOlOlOlOJOOOlUl and 1 HI 1111 11111 inioiOl 110 is convert^^ 
OlOlOlOlOlOlOlOlOOOOOlOO. Therefore, after exclusive ORing the signal with 1010 ... 10, it is 
ensured that the PLL wiU receive a balanced number of charging and discharging events as well as 
numerous transitions at the bit rate, thus aUowing the PLL to stay locked-on to the bit rate. 
Addhionally, the noise on the sigmU, sufficient to ensure transitions between positive and negative 
vahies of the signal, ensures that no bit wiU "stick" in a certam state for too long even during idle bit 
patterns. 

A "code violation" within the signal can be used to aUow the receiver to determine where 
each word begins. In order to provide this code violation, a locking bit can be placed at certam 
locations within the signal. For example, in an audio signal, right and left chamiel words can be 
interlocked in time, where each chamiel can have, for example, 16 bits as shown in Figure 9G. hi 
this case, the kxking bit can be located in a certain position of the right chamiel word, for example, 
in the least significant bit location. Hiis locking bit then gives the location of the right chamiel word, 
as weU as the location of the left chamiel word. This locking bit can be. for example, always a zero 
or always a one, which allows a receiver to lock on to the locking bit and, therefore, the word pattern 
of the digital bit stream. ]n a specific 1 6 bit word embodiment, after removing the DC and exclusive 
ORing with 1010...10, each, for example, right word is ANDed 904 with 11 1 1 1 1 1 1 1 1 1 1 1 1 10. This 
AND operation leaves the first 15 bits of the 16 bit word unchanged and necessarily encodes a zero 
in ti^ 1 6th bit location. This guarantees that each right word has as a locking bit, a zero m the least 
significant bit location, to allow determination of the location of each word in the digital signal at 
the receiver. It is important to note that it is not necessary for each word or even every other word 
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to have a locking bit encoded on it. Indeed, a locking bit could be encoded on every third or fourth 
word. In fact, the limit as to how far apart locking bits can be spaced is determined by the cost and 
complexity of the receiver to be used. 

Once processed as described above, the signal can be transmitted via a wired connection to 
headphones or through the air. In a specific example, referring to Figure 8A, for wireless 
transmission, the signal is inputted to a frequency shift keying (FSK) transmitter 905, such as a 
RF9901 FSK transmitter chip from RF Micro Devices, which modulates the signal for transmission 
from a transmitting loop antenna 906. A corresponding receiving loop antenna 907 receives the 
incoming FSK modulated signal and sends the signal to a FSK receiver 908, such as a RF9902 FSK 
receiver chip from RJ Micro Devices, which demodulates the signal. The demodulated signal can 
then be inputted to conventional two transducer headphones for listening. 

The receiver should be able to lock on to the bit rate and then lock on to the locking bit in 
order to decode the signal. Referring to Figure 9F, the receiver can comprise a phase lock loop 815, 
which provides a master clock 804 and aligns the clocking bits with the data bits provided from, for 
example, an RF demodulator. The receiver can fiirther comprise a state machine 800, which can be 
the center of the timing for the receiver, and can also perform a number of operations including: 
clocking fimctions for the D/A converter, reclocking of the data deUvered to the D/A, and control 
lines for master reset. The state machine can provide a serial clock 805, SCLK, a left/right clock 
806, 17R CLK, and data 803, SDATA, to a D/A converter. The state machine 800 can, for example, 
be a free running eight bit counter. Where the signal is transmitted wirelessly, the state machine 800 
receives the RF data 801 (RF Digital) and inverts the bits which were inverted prior to transmission, 
by exclusive ORing RF Digital 801 with a clocking signal Q3 802 which has a frequency one half 
of the bit rate (or 1/16 of the master clock). The data stream can then be latched to produce a strong, 
clean data bit stream, 803 (SDATA), to present to the D/A converter. 

The locking bit is encoded on the incoming data stream, RF Digital 801, to allow the 
receiver to maintain word lock. The locking bit can be, for example, always 0 (logic level low) in 
the least significant bit of the digital daU word. The state machine 800 looks for the locking bit 
during a window of time, the locking bit window 808, to determine if lock is being maintained. If 
a 0 is present, no action is taken; however, if a 1 is detected, the state machine 800 resets itself via 
its reset control line 809. After resetting, the state machine 800 can, for example, start over at a new 
data position and the process continues until lock is regained. It should be understood that the 
locking bit could always be 1 and then the state machine would reset upon detecting a 0 during the 
locking bit window 808. 
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In a specific embodiment, returning to Figure 8A, the demodulated signal output from the 
FSKrecewer9a8,caUedRFDIG80I..sinthesamebmaryfonnatasthesagnalwh.chen^^^^ 
FSK transmitter 905. ^ order to decode the s.gnal.u.smputted to a phase-locked loop (PLL) 81 5 
andalsomputtedtoanexclus.veORgate917tobeexclus.veORedwith 1010...10 ThePLL815 
is able to lock on to d:e frequency of die bit rate due to sufficient bit transitions provided by the 
exclusive ORing of the signal with 1010 ... 10 prior to transmission, which provides a strong 
frequency component at the bit rate and provides the PLL 815 a balanced number of charging and 
discharging events. Tie output of the PLL 815 is die master clock 804. MCLK, which has a 
frequency eight times the bit rate. THe MCLK is bputted to a divide-by-eight state machme 91 2 
with the output thereof; at a frequency equal to the bit rate, fed through a feedback loop 91 3 to the' 
PLL 815 and fed to latch 916. Additionally. MCLK 804 is inputted to a state machme 800 which 
generates clock signals at MCLK^ (or QO)810. MCLK/4 (or Ql)811, MCLK/8 (or Q2)805 
MCLK/16 (or Q3)802, MCLK^2 (or Q4)812. MCLK/64 (or Q5)813. MCLK/128 (or Q6)814 and 
MCLK^56 (or Q7)806, wherein MCLK^ means a clocking signal at the MCLK frequency divided 
by 2, etc. Figure 9G shows how these clock signals align with each other, the input signal RF digital 
801, the output of exclusive OR gate 91 7, XOR output 816. and the output of latch 916 SDATA 
803. 

Figure 9G shows two 16-bit words, right chamiel word D 15. D14, ... . DO, and left channel 
word D15, D14, ... , DO. from a digital bit stream, RFDIG 801 in Figure 8A. Note, these two I6.bit 
words could be considered one 32-bit word. In this embodiment, the first D15, D14, ... , DO can be 
a right chamiel word and die next D15, D14. ... , DO can be a left chamiel word. "mCLK/8 (or Q2 
805) is referred to herem as SCLK, the data clock at twice the bit rate, which can be used to 
detemiine the slate, one or zero, of each bit. To lock on to the locking bit, located at DO of the right 
cham^l word, an eight input NAND gate 915 with inputs NOT Q7 817. Q6 814 Q5 813 Q4 812 
Q3 802. NOT Q2 818, NOT Ql 819, and a bit value from latch 916, SDATA 803 after inversion,' 
922, is used. Utch 916 can delay each bit for one cycle of MCLK/4, or one-half the duration of a 
bit TTierefore, die output from latch 916, SDATA 803, is delayed with respect to the output of die 
exclusive OR 917, by one-half die duraUon of a bit. This latching and delay allows the bit to be 
dean and strong dunng the locking bit window 808. Figure 9G illustrates the aligmnent of SDATA 
803, and die various dock signals when the state machme is in lock widi die locking bit. 

However, before attaining lock on to die locking bit, die bit value during die lodong bit 
wmdow808, one or zero, from latch 916 is die bit value of Dn, which is any one of D15 D14 
DO, D15. Di4. .... DO from eiUier die left or nght chamid word as shown in Figure 9G The bii 
value of Dn is obtamed by Exdusive ORmg 917 RFDIG 801 widi Q3 802. Exclus.ve ORing 917 
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Q3 802 with RFDIG 801 inverts the previously inverted bits to generate a data signal, XOR output 
816, which IS a repUca of die original binary coded format signal 901 with die DC removed. Q3 802 
is s^mchronized widi RFDIG 801. by locking on to die bit rate. After die PLL 815 has locked on to 
the bit rate, die locking bit is located by first resettmg die state machine at a random position A^ithm 
the two 16 bit word cycle. If die output 921 of die NAND gate 915, after inversion by mverter 920, 
is a zero, dien die selected bit is a one and dierefore not die locking bit. Alternatively, die inverted 
NAND gate 915 output 921 will be one only when die inverted bit 922 from SDATA 803, is a one, 
corresponding to die bit from SDATA 803. die locking bit, being a zero. The inverted NAND gate 
915 output, 921, can only be a one if die inverted bit 922 from SDATA 803 is a one at die same time 
that NOT Q7 817 is a one, Q6 814 is a one, Q5 813 is a one, Q4 812 is a one, Q3 802 is a one, 
NOT Q2 818 is a one, and NOT Ql 819 is a one, based on die inputs to die NAND gate 915. As 
can be seen from Figure 9F, diis only occurs at die DO bit location of die right chamiel word. 
Therefore, if Dn (n^O) is arriving when DO should amve, dien die inverted NAND 915 output 921 
remains zero until Dn eventually becomes a zero. 

If; in Figures 8A and 90, Dn is a one, dien die inverted NAND gate 915 output 921 is zero, 
and die state machine 800 can be instructed to reset to die bit foUowing Dn, namely Dn+1. Since 
each bit location from D15. D14, .... DO. D15. D14. .... DO is guaranteed to alternate between one 
and zero, except die locking bit, DO of die right channel word which is always zero, die state machine 
can quickly lock on to die location of die locking bit. hi diis synchronized state, lock-on to die 
locking bit has been achieved The need to locate die locking bit is why it is imperative diat each of 
die odier bit locations are guaranteed to switch to a one state some time in die bit stream such diat 
no odier bit location remains in die zero state long enough to be mistaken as die lockmg bit. 

Example 4 

In an embodiment such as described in Example 2 or Example 3, if die digital signal is 
wirelessly transmitted dirou^ die air, for example from an FSK transmitter to a FSK receiver, die 
receiver can be located in a r^note unit while die transmitter can be located in a base unit. The base 
unit can, for example, comprise die HRTF processing circuitry including DSP chip 600, EEPROM 
710. and External EPROM 704, such as exerapUfied in Figure 23 A, as well as die signal processmg 
circuitry 901, 924, 902, 903, 904, FSK transmitter 905, and transmitting loop 906, such as 
exemplified in Figure 8A. The remote umt can, for example, comprise receiving loop 907, FSK 
receiver 908, PLL 815, state machine 800, NAND gate 800, and associated circuitry exemplified in 
Figure 8A, as weU as input means for HRTF mauiiing control 636, OK control 637, Noise control 
703, Bass control 680, Ears control 629. Seat control 643, Ambience conn-ol 696, Theater control 
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—on of .he .gna. .o. 0. ba„ ^ u.e .^ows .h. UsUnSZ^^T, 
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c..»n..«,,^_,««,,„„^^„„,,.3^^_^,^ ^^^^^ 

.gnai. ds.g„a.sa.era„s:ni«ed.o*=.a3=„„„by.fcre^pVnt 

In order for the r=mo« uni. u, decennin. if U>e base received IR signal. d„ base sends 
3re^s..nai^.eb.senn.„ 

«^os^ *ch.„ben received by U„,eoK,«^,i^ea«sr.ceip.b. .be base^, Of an« 
Signal from the remote unit. 

•nus ug bi. is a bi. encod«i sunilariy u, Ae lodcing bi,. For exanrple. if U« Icridng bi, is 
enc»^n,.beWs.signiflean,bi.,«a.ion„f^Hgb.cba^^ 
b^f.e^.,^^^,,^,^_,,,^^^^,^^^ 

ofU»^gb...Por„«.ance.ifU«,„ckingbi.isencod«,asone.ora^.,ben,he„gb,„0, 
^asade.^.v..,asazer„.„.a«.e,respecUve..^aspeeificen,bod^^^ 

-^.sene^asa«™,.edefa..vaineofU.eugb.uan.busbeaoneand^ 
be encoded by ORmg each lefi channel wrd with OOOOOOOOOOOOOOOl 

2-n.s.g^h.^.«eivedby.bebas.„n,..WbenU«base*^ 

AKD^a. leas, one ,eft ^„ ,,.,0 instead of Onng v.* 00.01. h. a p.fj 
»-2--^'^^---e.gb.consecn...gb.,s.^,.^.,,:^ 

--:rrpr:iz:::xr:r'':T°-'--- 
o..=.gh.cb^e,..dand..g\..„r:r!:i:::n~^^^^ 



NSOOCID: <WO 9725834A2_I_> 



wo 97/25834 



PCT/US97/00I45 



43 

channel word In thas embodiment, the receiver of the remote umt momtors the tag bit much hke ,t 
oK^nitors thelockmgbu. For example, an additional eight mput NAND gate similar to NAND gate 
915havmg inputs Q7 806, Q6 814. Q5 813. Q4 812, Q3 802. NOT Q2 818. NOT Q 1 819, and a 
bit value fromlatch916. SDATA 803. after mvcrs.on, 922. is used. Note, these are the same inputs 
for monitonng the locking bit location, except NOT Q7 81 7 is replaced with Q7 806. Figure 9F 
illustrates the ahgnment of SDATA 803. and the vanous clock signals when the state machine is in 
lock with the locking bit. 

If the inverted output of the NAND gate is a zero, then the tag bit is a one and therefore no 
IR signal has been received by the base. Alternatively, the inverted output of the NAND gate will 
be a one only when the mverted bit 922 from SDATA 803 is a one. corresponding to the bit from 
SDATA 803. the tag bit, being a zero. A zero value for the tag bit signifies the base unit has 
received an IR signal from the remote. 

The state machine 800 only looks for the tag bit during a smaU window in time, the tag bit 
window820 afteracommandissentviathelRhnk. The remote clears the tag bit latch, transmits 
the command word over the IR, and then watches for a zero bit to be latched onto the tag bit control 
line If a zero is latched, then the command was received by the DSP, the base; if a one is latched, 
then the command was not .reived andno action is taken by the remote unit. Whenaone is ^^^^^ 
and no action is taken by the remote, the user would be required to press the command button agam 
and resend die command over the IR link. Once the receiver locks on to the locking bit, the location 

of the tag bit will then be known. 

It should be understood that the examples and embodiments described herein are for 
iUustrative purposes only and that various modification or changes in Ught thereof will be suggested 
to persons skilled in die art and are to be mcluded within the spmt and purview of this appbcaUon 
and the scope of the appended claims. 
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Claims 

icas. one cWel has an andio cocponcn, whe^n said aUows a „^ o 

.ece,„an=as.cnep™ce.s.an.„con^..,^,,^^j:^7-^^^^^^ 

. -;»----'--ved6<.„„.orap,.a.^orp..on.,,e^,,J,^^^ 

wherein said mahod comprises llie steps of: Processing, 

(a) receiving andio component of each said at leas, one ohamiel 

(b) selecUn^ as a fimction of a user of headphones, a best-match s;t of head related 
transfer fnncdons (HRTFs) from a database of sets of HRTFs- 

(0 processing the audio component of each said at least one channel via a 
ccnespo^lmg pair of digital tilters, said pairs of digital filters filtering said a««o 
compone^asafiaictionoftheb^^tcb 

of digital filte^g^aprocessed left audio component andaprc^essed right 
audio component; 

(d) <»^^-idp,oecssed,eftandioc<.np^fl™^^^ 

of the signal to form a composite processed left audio component 

(e) combintag said processed nght audio component from each said at least one 
'^'of'h^^i-a'K.foanacomposi.eprocessedrightaudiocomponent 

(0 ^^''^^^aidcanpositeprocessedlellandrigh.audiocomponentstoheadphones 
.o^avir,ualhs,eningenvi,on™t,vherein said user ofheadphonesperceiv^^ 
Uiat the sotmd associated ™th each audio component has arrived from one of a 
plurality of positions, detennined by said processing 

2^ The method, according to claim 1, wherein said database of sets of HRTFs is 
^^byn^^g..^^,,^„,^„^^^^^_^_^^^^^_^^^^_^^_^^ 



oosi. /""""'"'•"""^^"'"^'•"''-"■"^P^i'i"" of said plurality of 

posiuonsispredaenmnedandcorrespondstooneofsaidatleastonechannel. 

^t of HRTFs, said med«l ftaher comprises the step of selecting . ^u,on subset of HRTF, .V 
^^-matchsetofHRTFs,eachoftheselectedHRTPsofs.d:bse.ofrrb™^^^ 
=o as locorrespondtoavn^a, position Closest tooneof said predeterminedposiuonsll:^ 
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user of said headphones perce.ves that the sound associated wtth each said at least one channel 
originates from or near to said corresponding predetermined position. 

5. The method according to claim l.fiuthercomprismganyoneoraU ofthefollowmg 

steps: 

(a) processing the audio component of at least one of said at least one chamiel of the 
signal via a bass boost circuit prior to processing said audio component of said at 
least one channel via the pair of digital filters; 

(b) prior to applying the composite processed left and right audio components to the 
headphones, fiirthcr processing the composite processed left audio component and 
the composite processed right audio component via an ear canal resonator circuit. 

6 The method according to claim I. whcrem said audio component of each said at 
least one channel of the signal is processed such that said predetermined posiuoas are specified by 
a Dolby Pro Logic® audio component. 



7. The method, according to claim 1, further comprising the steps of: 

(a) collecting a database of measured HRTFs; 

(b) ordering said database so that a rcpresentaUve subset of the entire coUection of 
HRTFs is obtained and stored in storage means; and 

(c) selectmg a best-match set of HRTFs from said storage means such that a user 
performing said selecting perceives audio signals processed using sajd best-match 
set of HRTFs in the proper spatial positions. 

8. The method of claim 7 wherein said database is ordered by clustering said measured 

HRTFs. 



i 

> and 25 HRTF sets. 



The method of claim 7 wherein said represenutivc subset comprises between 15 
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The mcthcxi of claini 8 wherein sa.d database compnses S-L*2 spectra, with 

L - the number of locations measured; and 

S = the number of difference subjects measured, wherein 

16<S<200. 



9 
10 
11 
12 
13 
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16 

17 

18 

19 

20 

1 

2 



match HRTF set v.a HRTF clustenn, fwther comprises steps of: 

(a) P«^<-^cl.-anal>^onthedatabaseofHRTFse^^ 
among theHRTF sets to order the HRTF sets intoad 

tl^isdefinedahighest level clustercontammgall the setsofHRTFsstored in t^^ 
database. wher.m each cluster of HRTF sets contams dther one HRTF set, only 

HRTF sets which have oostat^cal^ocnce between them, orapluraUty of sub- 
clusters of HRTF sets; 

C) «^«'<P-««iveHRTF«tfromeacho,,eofa«ofsub..„s„.of 
the highest level cluster of HRTF sets; 

(0) saccting a vinual Urge, s;.b«t of HRTFs from each represcMadve HRTF ^ 
«toe,n e«i posidon subset of HRTFs is associated wid, a p™leten„i„ed vimial 
target position; 

(d) ^o^'ot^^.^plorm^olso^sie^,,^^^,^^^^^^^ 
signals beii-g filtered by one of sdd plurality of p„s.tio„ subscu. of HRTFs- 

(e) selecting, by the user, one of said pluraUiy of sound signds as a «„Kd» of 
apptopnate sound spadatodon tosaid predetemnncd vin.^ target posidon, d« 
selected sound signal cooesponding to the best-match cluster, therein 
rcpresentadve HRTF set of d,e best-match duster defines the best-n-atch HRTF 
set. 



'2. ™='«^^S»clain,ll.„h«ineachselec,ed,ep,«en,adveHRTFs« 
.a«««dorpopul.HRTFwhichn«,exen,pUfies.hesinulaHuesbe^ 
.hesub..us.erofHRTFse.sfl.™^ch.herepresentadveHRTFsetisselectcd 



• 13 



2 ^"^'^"'^S.oclaimll.^hereineachselectcdrepresen.ativeHRTFis 

3 fr°™ which the representative HRTF set is selected. 
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14, Themethodaccordingtoclaim ll.wheremthestepofmaichmgthelisten^^ 



2 best-match HRTF set via HRTF clustering further comprises the steps of: 

3 (a) after selecting, by the user, one of said plurality of sound signals as a function of 

4 said predetermined virtual target position, selecting a representative HRTF set from 
each sub-cluster of the best-match cluster; 

(b) selecting a subset of HRTFs from each representative HRTF set of each sub-cluster 
of the best-match cluster, wherein each subset of HRTFs is associated with a 
predetermined virtual target position; 
g (c) providing, to die user, a plurality of sound signals, each of said plurality of sound 

j Q signals filtered with one of said pluraUty of subsets of HRTFs corresponding to the 

J J plurality of sub-clusters of the best-match cluster; 

^2 (d) selecting one of said plurality of sound signals as a function of a predetermmed 

j3 virtual target position, the selected sound signal corresponding to the best-match 

j4 cluster, wherein the representaUve HRTF set of the best-match cluster defines the 

J 5 best-match HRTF set; 

1 6 (e) repeating steps a through d until the best-match duster contains only one HRTF set 

or contains only HRTF sets which have no statistical difference between them. 

1 15 A device for processmg a signal comprising at least one chamiel, wherem each said 

2 atleastonechannelhas anaudiocomponent, whereinsaiddeviceprocesseseachaudiocomponent 

3 such that a user of headphones can receive the processed audio component from each said at least 

4 one channel and perceive that the sound associated with each audio component has amved from one 

5 of a plurality of positions, said device comprising. 

\ (a) at least one pair of digital fitters, each pair of digital filters receivmg an audio 

7 component and applying a pair of head related transfer functions (HRTFs) to said 

8 audio component, the HRTFs being determined as a fimcuon of a user of the 

9 headphones from a database of sets of HRTFs, each pair of digital filters 
J Q generating a left signal and right signal; 

J J (b) a first combining circuit combinmg the left signals for each said at least one 

J 2 channel to form a left output signal; and 

a second combining circuit combining the right signals for each said at least one 
channel to form a right output signal, the left and nght output signals, when apphed 
to the headphones, creating a virtual Ustenuig environment wherein a user of said 
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headphone, perceives .ha. d,c sound assoe.a^ ^ each audio cc^ponem has 
amvcd from one of a plurality of posUons, deunnined by said p„«ssing^ 

The dev,ee according u> dann 15. funher comprising any one or more of 

a bass boos, circ., coupled to a. leas, one pair of digi.al filers, bass boos, 

c«an.increasingatow&e,uency„=^of asignal u^,u. „u,= bass boos, circu.. 

>n ear canal resonator circa., coupled u> *e left and righ. o„.p„. signals- and ' 

a reverbera.ion crcui. coupled .o a. leas, one of said a. leas, one channel a firs. 

OU.PU. and a second ou.p„. of U.e reverberanon cucui. be„g coupled to a 

respeeuve one of d,e firs, and second combining circuiK^ 



(a) 



17, '^-'k'X'forproducingsoundoverheadphonesto.isaccura.elyspaaalizedfor 
a g.ven user ofttB headphones which comprises: 

providing said user wid, a eon.rol device which ooo,roU . pROM p^g,^^ 
w,d> a daubase of n=prcs=n.a.,v= HRTFs se.s amenable to selcUoo by said user 
of a best-match HRTF set; 

transfemng and storing sa.d best-match HRTF set to RAM linked to a DSP; and 



(b) 
(c) 



processing an audio signal by saxd DSP using sa,d best-match HRTF set and 
transnuttmg said processed audio signal to said user for perception. 



18. 



. ^^-'^^---'iP^ocessu.g comprises decodmg 

-«^^ofs.gnalsp„ortousingsaidbest-matchHRTFsetan^^ 

bass boost processmg, and any combination thereof 

' 19. •n.methodaccordingtodaim 1 8. te.m said selecuon of sa.d best-match HRTF 

set composes transmitting sound v.a headphones to a user from a mam processmg devL 
3 pfogrammed wiUi a pluraliB- of HRTF <«. . >»«ssmg oevice 

m daubas. of HRTF sett measu^i from a su£6c.=n. number of indiv,d„als m d„ gene,^ 

popu.a..o„ such a s....rical analys. Of d. measured da. reveals u.ere b!Z^ 
6 ~-^..-a«.ofso.dspadaa.«c„,,3^„,_^„;;^^^^^^ 
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7 HRTF sets were used to program said processing device, and allowing die user to identify a first 

8 approximataoa of a best-match HRTF set by localizing sounds in pre-deterrained virtual locations. 

1 20. The method according to claim 1 9 whereuj said database of representative HRTFs 

2 is selected from a database of measured HRTF sets, generated by measuring the individual HRTF 

3 sets of at least sixteen individuals wherein said measuring is achieved using a single robot-arm 

4 positioned sound source. 

1 2 1 . A device for producing sound over headphones that is accurately spatialized for a 

2 given user of the headphones which comprises; 

3 (a) a paipheral control device which controls a PROM programmed vkith a database 

4 of representative HRTFs sets from amongst which said user is able to select a best- 

5 match HRTF set; and 

6 (b) a Random Access Memory (RAM) resident within a main processing device which 

7 is programmed with said best-match HRTF set. 

1 22. The device according to claim 21 comprising a means for wired or wireless 

2 transmission of sound processed by said main processing device programmed with said best-match 

3 HRTF set. 

1 23. The device according to claim 22 wherein said sound is a digital signal and said 

2 means for wireless transmission is a digital processing means comprising; 

3 (a) a filtering means for removing the DC component from said digital signal; 

4 (b) a first inverting means for inverting every other bit of said digital signal; and 

5 (c) an encoding means for encoding a locking bit into said digital signal. 

1 24. The device according to claim 23 wherein any one or more of the following apply: 

2 (a) said digital signal is a binary digital signal; 

3 (b) said filtering means is an adaptive filter; 

4 (c) said filtering means is a high-pass filter; 

5 (d) said first inverting means is an exclusive OR gate having as inputs said digital 

6 signal and a digital bit stream comprising alternating ones and zeroes (...101010...); 

7 and 
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(e) said encoding means ,s an AND gate having as input said digital signal and a 
repeating sequenceof (...1 1 1 1 1 1...10. .), .vherem said AND gate encodes a zero as 
a locking bit every n* bit, where n is an integer. 

25. The device according to claim 24 wherein said digital signal is comprised of digital 
words, wherein said locking bit is encoded into the least significant bit location of each digital word 
into which it is encoded. 

26. The device, according to claim 25 wherem said lockmg bit is encoded into each 
digital word as the terminal bit of each said digital word into which it is encoded. 

27. The device, according to claim 24 fiirther comprising^ 

(a) a transmitting means for transmitting said digital signal; and 

(b) a receiving means for receiving said digital signal. 

28. The device, according to claim 27, wherein said receiving means comprises: 

(a) a first locking means for locking onto the bit rate of said received digital signal; 

(b) a second locking means for locking onto the locking bit of said received digital 
signal; and 

(c) a second inverting means for inverting said previously inverted bits. 

29. The device, according to claim 28, wherein any or all of the following apply: 

(a) said first locking means is a phase locked loop; 

(b) said second locking means is a state machine; and 

(c) said transmitting and receiving means are wireless. 

30. A device for rapidly and accurately generating a database of HRTF sets based on 
measurements fi-om a large number of individuals comprising: 

(a) a single, robot-arm positioned sound source; 

(b) a robot-arm for positioning said single sound source; 

(c) a measurement control system; and 

(d) transducers for measuring sound and distortions thereof as it is received at each ear 
of an individual whose HRTF sets are being measured, after bemg generated by 
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8 said smgle sound source at various locations about the individual wearing said 

g transducers. 

1 3 1 . The device of claim 30 wherein said transducers are positioned at the entrance of the 

2 outer ear canal of the individual whose HRTF sets are being measured. 

1 32 . A device for spatializing sound over headphones which comprises. 

2 (a) a means for storing a representaUve set of HRTFs selected from a 

3 database of measured HRTFs; 

4 (b) a means for a user to select a set of HRTFs from said means for storing 

5 said representative set of HRTFs; and 

5 (c) a means for processing audio signals using said set of HRTFs selected by 
the user such that the user perceives the corresponding sounds to be 

g locahzed on the proper spatial positions; 

9 wherein said database of measured HRTFs comprises S*L*2 spectra, with 

0 L = the number of locations measured, and 

1 1 S = the number of difference subjects measured, wherein 



7 



12 16<S<200. 



1 



33. The method according to claim 17 wherein said signal is a digital signal and said 

2 transmitting comprises: 

3 (a) removing the DC component of said digital signal if present; 

4 (b) inverting every other bit of said digital signal; and 

5 (c) encoding a locking bit into said digital signal. 

1 34. The mediod according to claim 33, vtoin any one or more at the foUowing apply: 

2 (a) said digital signal is a binary digital signal; 

3 (b) said removing of said DC component is achieved by adaptive filtering; 

4 (c) said removing of said DC component is achieved by high-pass filtering; 

5 (d) said inverting ofevery other bit of said digital signal is accompUshed by exclusive 

6 ORing said digital signal with a digital bit steam comprising alternating ones and 

7 zeroes (...101010...); 
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8 (e) 
9 
10 

11 (0 

12 
13 

1 35. 

2 (a) 

3 (b) 

1 36. 
2 



(a) 

3 (b) 
4 



1 37. 

2 (a) 



3 



(b) 



said encoding of a locking bu mto said digital signal .s achieved by encoding said 

lockmg bit u^ac^rtambu location of every n- word compnsmg said dig^^^ 
wherein n is an integer; and 

said encoding of a locking bit into said digital signal is achieved by encoding said 
locking bit at eveo'n-bit of said signal wherein said locking bit is always aone or 
always a zero and wherein n is an integer. 

•nie method, according to claim 17, fiirther comprising the steps of 
transmitting said digital signal; and 

receiving said digital signal to produce a received digital signal. 

The method according to claim 35, wherein said receivmg step comprises: 
locking onto the bit rate of said received digital signal; 
locking onto the locking bit of said received digital signal; and 



(c) inverting the previously inverted bits. 



T^e method, according to claim 36. wherein any one or mo,^ of the foUowing apply 
said locking onto said bit rate of said received digital signal is accompUshed by a 
phase locked loop; and 

said locking onto said locking bit of said received digital signal is accomplished 
with a state machine. 



J«. AstoragemeansencodedwithadatabaseofHRTFssuchthatHRTFsappropriate 
for a particular mdividual may be retrieved from such storage means to act as a filter in digital 
processmg of an audio signal transmitted to headphones for accurate sound spatiahzation 
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